📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • AUTOENCODERS
  • Variational AE
  • SELF ORGANIZING MAPS (SOM)
  • NEURO EVOLUTION (GA/GP based)
  • Radial Basis Function Network (RBFN)
  • Bayesian Neural Network (BNN)
  • CONVOLUTIONAL NEURAL NET
  • Graph Convolutional Networks
  • CAPSULE NEURAL NETS
  • Transfer Learning using CNN
  • VISUALIZE CNN
  • Recurrent Neural Net (RNN)
  • RNN
  • LSTM
  • BACK PROPAGATION
  • UNSUPERVISED LSTM
  • GRU
  • QRNN
  • GRAPH NEURAL NETWORKS (GNN)
  • GNN courses
  • Deep walk
  • Node2vec
  • Graphsage
  • SDNE - structural deep network embedding
  • Diff2vec
  • Splitter
  • SIGNAL PROCESSING NN (FFT, WAVELETS, SHAPELETS)
  • HIERARCHICAL RNN
  • NN-Sequence Analysis
  • SIAMESE NETWORKS (one shot)

Was this helpful?

  1. Deep Learning

Deep Learning Models

PreviousEmbeddingNextDeep Network Optimization

Last updated 2 years ago

Was this helpful?

AUTOENCODERS

  1. - using keras’ functional API

  2. - regular, deep, sparse, regularized, cnn, variational

    1. A keras.io but explains AE quite nicely.

  3. on PCA vs AE, basically some info about what PCA does - maximizing variance and projecting and then what AE does and can do to achieve similar but non-linear dense representations

  4. summarized in the KPCA section of this notebook. + +xchange

  5. ,

  6. , sequence to sequence pre training for NL generation translation and comprehension.

  7. ,

Variational AE

SELF ORGANIZING MAPS (SOM)

  1. Git

NEURO EVOLUTION (GA/GP based)

NEAT

NEAT implements the idea that it is most effective to start evolution with small, simple networks and allow them to become increasingly complex over generations.**

**That way, just as organisms in nature increased in complexity since the first cell, so do neural networks in NEAT.

This process of continual elaboration allows finding highly sophisticated and complex neural networks.**

HYPER-NEAT

HyperNEAT is based on a theory of representation that hypothesizes that a good representation for an artificial neural network should be able to describe its pattern of connectivity compactly.**

Radial Basis Function Network (RBFN)

  • An RBFN performs classification by measuring the input’s similarity to examples from the training set.

  • Each RBFN neuron stores a “prototype”, which is just one of the examples from the training set.

  • When we want to classify a new input, each neuron computes the Euclidean distance between the input and its prototype.

  • Roughly speaking, if the input more closely resembles the class A prototypes than the class B prototypes, it is classified as class A.

Bayesian Neural Network (BNN)

Under the BNN framework, prediction uncertainty can be categorized into three types:

  1. Model uncertainty captures our ignorance of the model parameters and can be reduced as more samples are collected.

  2. model misspecification

  3. inherent noise captures the uncertainty in the data generation process and is irreducible.

Note: in a series of articles, uber explains about time series and leads to a BNN architecture.

Vanilla LSTM did not work properly, therefore an architecture of

Regarding point 1: ‘run prediction with dropout 100 times’

Is it applicable for time series? In the figure below he tried to predict the missing signal between each two dotted lines, A is a bad estimation, but with a dropout layer we can see that in most cases the signal is better predicted.

Going back to uber, they are actually using this idea to predict time series with LSTM, using encoder decoder framework.

Note: this is probably applicable in other types of networks.

“import keras

inputs = keras.Input(shape=(10,))

x = keras.layers.Dense(3)(inputs)

outputs = keras.layers.Dropout(0.5)(x, training=True)

model = keras.Model(inputs, outputs)“

CONVOLUTIONAL NEURAL NET

  • Convolution Layer primary purpose is to extract features from the input image. Convolution preserves the spatial relationship between pixels by learning image features using small squares of input data.

  • ReLU (more in the activation chapter) - The purpose of ReLU is to introduce non-linearity in our ConvNet

  • Spatial Pooling (also called subsampling or downsampling) reduces the dimensionality of each feature map but retains the most important information. Spatial Pooling can be of different types: Max, Average, Sum etc.

  • Dense / Fully Connected - a traditional Multi Layer Perceptron that uses a softmax activation function in the output layer to classify. The output from the convolutional and pooling layers represent high-level features of the input image. The purpose of the Fully Connected layer is to use these features for classifying the input image into various classes based on the training dataset.

The overall training process of the Convolutional Network may be summarized as below:

  • Step1: We initialize all filters and parameters / weights with random values

  • Step2: The network takes a single training image as input, goes through the forward propagation step (convolution, ReLU and pooling operations along with forward propagation in the Fully Connected layer) and finds the output probabilities for each class.

    • Let's say the output probabilities for the boat image above are [0.2, 0.4, 0.1, 0.3]

    • Since weights are randomly assigned for the first training example, output probabilities are also random.

  • Step3: Calculate the total error at the output layer (summation over all 4 classes)

    • (L2) Total Error = ∑ ½ (target probability – output probability) ²

  • Step4: Use Backpropagation to calculate the gradients of the error with respect to all weights in the network and use gradient descent to update all filter values / weights and parameter values to minimize the output error.

    • The weights are adjusted in proportion to their contribution to the total error.

    • When the same image is input again, output probabilities might now be [0.1, 0.1, 0.7, 0.1], which is closer to the target vector [0, 0, 1, 0].

    • This means that the network has learnt to classify this particular image correctly by adjusting its weights / filters such that the output error is reduced.

    • Parameters like number of filters, filter sizes, architecture of the network etc. have all been fixed before Step 1 and do not change during training process – only the values of the filter matrix and connection weights get updated.

  • Step5: Repeat steps 2-4 with all images in the training set.

The above steps train the ConvNet – this essentially means that all the weights and parameters of the ConvNet have now been optimized to correctly classify images from the training set.

When a new (unseen) image is input into the ConvNet, the network would go through the forward propagation step and output a probability for each class (for a new image, the output probabilities are calculated using the weights which have been optimized to correctly classify all the previous training examples). If our training set is large enough, the network will (hopefully) generalize well to new images and classify them into correct categories.

  1. Over sampling

  2. Undersampling

  3. Thresholding probabilities (ROC?)

  4. Cost sensitive classification -different cost to misclassification

  5. One class - novelty detection. This is a concept learning technique that recognizes positive instances rather than discriminating between two classes

The results indication (loosely) that oversampling is usually better in most cases, and doesn't cause overfitting in CNNs.

CONV-1D

1x1 CNN

    1. “This is the most common application of this type of filter and in this way, the layer is often called a feature map pooling layer.”

    2. “In the paper, the authors propose the need for an MLP convolutional layer and the need for cross-channel pooling to promote learning across channels.”

    3. “the 1×1 filter was used explicitly for dimensionality reduction and for increasing the dimensionality of feature maps after pooling in the design of the inception module, used in the GoogLeNet model”

    4. “The 1×1 filter was used as a projection technique to match the number of filters of input to the output of residual modules in the design of the residual network “

MASKED R-CNN

Invariance in CNN

MAX AVERAGE POOLING

  1. A max-pool layer compressed by taking the maximum activation in a block. If you have a block with mostly small activation, but a small bit of large activation, you will loose the information on the low activations. I think of this as saying "this type of feature was detected in this general area".

  2. A mean-pool layer compresses by taking the mean activation in a block. If large activations are balanced by negative activations, the overall compressed activations will look like no activation at all. On the other hand, you retain some information about low activations in the previous example.

  3. MAX pooling In other words: Max pooling roughly means that only those features that are most strongly triggering outputs are used in the subsequent layers. You can look at it a little like focusing the network’s attention on what’s most characteristic for the image at hand.

Dilated CNN

Graph Convolutional Networks

CAPSULE NEURAL NETS

Transfer Learning using CNN

  1. To Add keras book chapter 5 (i think)

    1. Classifier: The pre-trained model is used directly to classify new images.

    2. Standalone Feature Extractor: The pre-trained model, or some portion of the model, is used to pre-process images and extract relevant features.

    3. Integrated Feature Extractor: The pre-trained model, or some portion of the model, is integrated into a new model, but layers of the pre-trained model are frozen during training.

    4. Weight Initialization: The pre-trained model, or some portion of the model, is integrated into a new model, and the layers of the pre-trained model are trained in concert with the new model.

VISUALIZE CNN

Recurrent Neural Net (RNN)

RNN

a basic NN node with a loop, previous output is merged with current input (using tanh?), for the purpose of remembering history, for time series - to predict the next X based on the previous Y.

  • N to 1 = classification

  • N to N = predict frames in a movie

  • N\2 with time delay to N\2 = predict supply and demand

  • Vanishing gradient is 100 times worse.

  • Gate networks like LSTM solves vanishing gradient.

** Experimental improvements:

Masking for RNNs - the ideas is simple, we want to use variable length inputs, although rnns do use that, they require a fixed size input. So masking of 1’s and 0’s will help it understand the real size or where the information is in the input. Motivation: Padded inputs are going to contribute to our loss and we dont want that.

LSTM

    • That return sequences return the hidden state output for each input time step.

    • That return state returns the hidden state output and cell state for the last input time step.

    • That return sequences and return state can be used at the same time.

    • TimeDistributed Layer - used to connect 3d inputs from lstms to dense layers, in order to utilize the time element. Otherwise it gets flattened when the connection is direct, nulling the lstm purpose. Note: nice trick that doesn't increase the dense layer structure multiplied by the number of dense neurons. It loops for each time step! I.e., The TimeDistributed achieves this trick by applying the same Dense layer (same weights) to the LSTMs outputs for one time step at a time. In this way, the output layer only needs one connection to each LSTM unit (plus one bias).

For this reason, the number of training epochs needs to be increased to account for the smaller network capacity. I doubled it from 500 to 1000 to match the first one-to-one example

  • Sequence Learning Problem

  • One-to-One LSTM for Sequence Prediction

  • Many-to-One LSTM for Sequence Prediction (without TimeDistributed)

  • Many-to-Many LSTM for Sequence Prediction (with TimeDistributed)

Stateful vs Stateless: crucial for understanding how to leverage LSTM networks:

Machine Learning mastery:

1. Scale to -1,1, because the internal activation in the lstm cell is tanh.

Return_sequence is needed for stacked LSTM layers.

This is a nice helper add-on by Keras, and most other Keras examples you have seen the training and test set was passed into the fit method, after you have manually made the split. The value of having a validation set is significant and is a vital step to understand how well your model is training. Ideally on a curve you want your training accuracy to be close to your validation curve, and the moment your validation curve falls below your training curve the alarm bells should go off and your model is probably busy over-fitting.

Keras is a wonderful framework for deep learning, and there are many different ways of doing things with plenty of helpers.

This tutorial clearly shows how to manipulate input construction, lstm output neurons and the target layer for the purpose of those three problems (1:1, 1:m, m:m).

BIDIRECTIONAL LSTM

(what is?) Wiki - The basic idea of BRNNs is to connect two hidden layers of opposite directions to the same output. By this structure, the output layer can get information from past and future states.

BRNN are especially useful when the context of the input is needed. For example, in handwriting recognition, the performance can be enhanced by knowledge of the letters located before and after the current letter.

.. It allows you to specify the merge mode, that is how the forward and backward outputs should be combined before being passed on to the next layer. The options are:

  • ‘sum‘: The outputs are added together.

  • ‘mul‘: The outputs are multiplied together.

  • ‘concat‘: The outputs are concatenated together (the default), providing double the number of outputs to the next layer.

  • ‘ave‘: The average of the outputs is taken.

The default mode is to concatenate, and this is the method often used in studies of bidirectional LSTMs.

BACK PROPAGATION

UNSUPERVISED LSTM

GRU

  1. update gate helps the model to determine how much of the past information (from previous time steps) needs to be passed along to the future.

  2. Reset gate essentially, this gate is used from the model to decide how much of the past information to forget.

RECURRENT WEIGHTED AVERAGE (RNN-WA)

What is? (a type of cell that converges to higher accuracy faster than LSTM.

it implements attention into the recurrent neural network:

QRNN

GRAPH NEURAL NETWORKS (GNN)

GNN courses

Deep walk

Node2vec

Graphsage

SDNE - structural deep network embedding

Diff2vec

Splitter

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

Nodevectors

SIGNAL PROCESSING NN (FFT, WAVELETS, SHAPELETS)

    1. Analyse signal variability and correlation

HIERARCHICAL RNN

NN-Sequence Analysis

SIAMESE NETWORKS (one shot)

  1. MULTI NETWORKS

Unread -

- improves VAE

Optimus - , , ****

,

***

,

, intuition towards each node and what it represents in a vision. I.e., each face resembles one of K clusters.

, explains inference - averaging, and cons of the method.

**stands for NeuroEvolution of Augmenting Topologies. It is a method for evolving artificial neural networks with a genetic algorithm.

**computes the connectivity of its neural networks as a function of their geometry.

The encoding in HyperNEAT, called **, is designed to represent patterns with regularities such as symmetry, repetition, and repetition with variationץ

(WIKI) [Compositional pattern-producing networks]() (CPPNs) are a variation of artificial neural networks (ANNs) that have an architecture whose evolution is guided by genetic algorithms**

+

The approach is more intuitive than the MLP.

Architecture_Simple

- (what is?) according to Uber - architecture that more accurately forecasts time series predictions and uncertainty estimations at scale. “how Uber has successfully applied this model to large-scale time series anomaly detection, enabling better accommodate rider demand during high-traffic intervals.”

- training on multi-signal raw data, training X and Y are window-based and the window size(lag) is determined in advance.

***

The blog post explains, for example, that with a CNN of apples, oranges, cat and dogs, a non related example such as a frog image may influence the network to decide its an apple, therefore we can’t rely on the probability as a confidence measure. The ‘run prediction with dropout 100 times’ should give us a confidence measure because it draws each weight from a bernoulli distribution.

“By applying dropout to all the weight layers in a neural network, we are essentially drawing each weight from a . In practice, this mean that we can sample from the distribution by running several forward passes through the network. This is referred to as .”

Taken from Yarin Gal’s . In this figure we see how sporadic is the signal from a forward pass (black line) compared to a much cleaner signal from 100 dropout passes.

, he talks about uncertainty in Neural networks and using BNNs. he may have proved this thesis, but I did not read it. This blog post links to his full Phd.

Old note: ) that in order to trust your network’s classification, you drop some of the neurons during prediction, you do this ~100 times and you average the results. Intuitively this will give you confidence in your classification and increase your classification accuracy, because only a partial part of your network participated in the classification, randomly, 100 times. Please note that Softmax doesn't give you certainty.

The says to add trainable=true for every dropout layer and add another drop out at the end of the model. Thanks sam.

() -

- we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue

Using several imbalance scenarios, on several known data sets, such as MNIST

on 1x1 cnn, for dim reduction, decreasing feature maps and other usages.

- “Small shifts -- even by a single pixel -- can drastically change the output of a deep network (bars on left). We identify the cause: aliasing during downsampling. We anti-alias modern deep networks with classic signal processing, stabilizing output classifications (bars on right). We even observe accuracy increases (see plot below).

: In the last few years, experts have turned to global average pooling (GAP) layers to minimize overfitting by reducing the total number of parameters in the model. Similar to max pooling layers, GAP layers are used to reduce the spatial dimensions of a three-dimensional tensor. However, GAP layers perform a more extreme type of dimensionality reduction,

**** - the trick behind them, concatenating both f(x) = x

, where features can be identified without relations to each other in an image, i.e. changing the location of body parts will not affect the classification, and changing the orientation of the image will. The promise of capsule nets is that these two issues are solved.

there are more parts to the series

on TL using CNN

(What is RNN?) by Andrej Karpathy - , basically a lot of information about RNNs and their usage cases 1 to N = frame captioning

(how to initialize?) - don't worry about initialization, use normalization and GRU for big networks.

- ”Simplified RNN, with pytorch implementation” - changing the underlying mechanism in RNNs for the purpose of parallelizing calculation, seems to work nicely in terms of speed, not sure about state of the art results. , author claims he already mentioned these ideas (QRNN) , a year before, however it seems like his ideas have also been reviewed as (PixelRNN). Its probably best to read all 3 papers in chronological order and use the most optimal solution.

, enables you to build complex rnns with keras. Details on their significance are inside the link

, ,

Visual attention RNNS - Same idea as masking but on a window-based cnn.

LSTM - the first reference for LSTM on the web, but you should know the background before reading.

- you have to understand this concept before you dive in. i.e, Hidden state is overall state of what we have seen so far. Cell state is selective memory of the past. The hidden state (h) carries the information about what an RNN cell has seen over the time and supply it to the present time such that a loss function is not just dependent upon the data it is seeing in this time instant, but also, data it has seen historically.

- a comparison of many LSTMs variants and they are pretty much the same performance wise

- comparison of lstm variants, vanilla is mostly the best, forget and output gates are the most important in terms of performance. Other conclusions in the paper..

Master on

Mastery on - but makes sense for all types of networks

Mastery on r

Mastery on ,

Mastery on and seq2seq

Mastery on , as a whole model wrap, or on every layer in the model which is equivalent and preferred.

Master on for sequence prediction

Unread - sentiment classification of IMDB movies using

- (jakob) single point prediction, sequence prediction and shifted-sequence prediction with code.

**

on stateful vs stateless, intuition mostly with code, but not 100% clear

important notes:

2. - True, needs to reset internal states, False =stateless. Great info & results , with seeding, with training resets (and not) and predicting resets (and not) - note: empirically matching the shampoo input, network config, etc.

3. , and how to use each one and both at the same time.

4. - each layer has represents a higher level of abstraction in TIME!

- a good explanation about differences between input_shape, dim, and what is. Additionally about layer calculation of inputs and output based on input shape, and sequence model vs API model.

A of LSTM/GRU/MGU with batch normalization and various initializations, GRu/Xavier/Batch are the best and recommended for RNN

: - it looks like LSTM and GRU are competitive to mutation (i believe its only in pytorch) adding a bias to LSTM works (a bias of 1 as recommended in the ), but generally speaking there is no conclusive empirical evidence that says one type of network is better than the other for all tests, but the mutated networks tend to win over lstm\gru variants.

- unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initializationSetting it to true will also force bias_initializer="zeros". This is recommended in

- The validation split variable in Keras is a value between [0..1]. Keras proportionally split your training set by the value of the variable. The first set is used for training and the 2nd set for validation after each epoch.

: unclear.

- using maxlength it will either pad with zero if smaller than, or truncate it if bigger.

Imbalanced classes? Use s, another explanation about class_weights and sample_weights.

SKlearn Formula for balanced class weights and why it works,

, but with focus on lstm one to one, one to many and many to many - here the timedistributed is applying a dense layer to each output neuron from the lstm, which returned_sequence = true for that purpose.

explanation- It involves duplicating the first recurrent layer in the network so that there are now two layers side-by-side, then providing the input sequence as-is as input to the first layer and providing a reversed copy of the input sequence to the second.

, ,

- To solve the vanishing gradient problem of a standard RNN, GRU uses, so called, update gate and reset gate. Basically, these are two vectors which decide what information should be passed to the output. The special thing about them is that they can be trained to keep information from long ago, without washing it through time or remove information which is irrelevant to the prediction.

1. the keras implementation is available at **

2. the whitepaper is at

(amazing) - really good insight to what they do (compressing data, vs adjacy graphs, vs graphs, high dim relations, etc.)

(amazing)

Octavian in medium on graphs, , clever, mcgraph, regression, classification, embedding on graphs.

**

, w2v, pytorch w2v, networkx, sparse matrices, matrix factorization, dictionary optimization, part 1 here

, original:

Really good - **

Michael Bronstein’s (worth reading)

, paper, examples - The graph attentional layer utilised throughout these networks is computationally efficient (does not require costly matrix operations, and is parallelizable across all nodes in the graph), allows for (implicitly) assigning different importances to different nodes within a neighborhood while dealing with different sized neighborhoods, and does not depend on knowing the entire graph structure upfront—thus addressing many of the theoretical issues with approaches.

Medium on

, : Learning Node Representations from Structural Identity- The struc2vec algorithm learns continuous representations for nodes in any graph. struc2vec captures structural equivalence between nodes.

, from ML to GNN.

- graphs, sets, groups, GNNs.

and medium on

,

, , , “Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts”

16.

17. , similar to deep walk with node skips. - lots of improvements, works in scale due to lower size representations, improves results, etc.

, The fastest network node embeddings in the west

- decomposing frequencies

:

, compression, detect edges, detect features with various orientation, analyse signal power, detect and localize transients, change points in time series data and detect optimal signal representation (peaks etc) of time freq analysis of images and data.

Can also be used to , analyse images in space, frequencies, orientation, identifying coherent time oscillation in time series

(did not read) - can this be applied to other time series prediction?

How to use AE for dimensionality reduction + code
Keras.io blog post about AE’s
replicate post
Examples of vanilla, multi layer, CNN and sparse AE’s
Another example of CNN-AE
Another AE tutorial
Hinton’s coursera course
A great tutorial on how does the clusters look like after applying PCA/ICA/AE
Another great presentation on PCA vs AE,
another one
StackE
Autoencoder tutorial with python code and how to encode after
mastery
Git code for low dimensional auto encoder
Bart denoising AE
Attention based seq to seq auto encoder
git
AE for anomaly detection, fraud detection
Simple explanation
Pixel art VAE
Unread - another VAE
Pixel GAN VAE
Disentangled VAE
pretrained VAE
paper
Microsoft blog
Sompy
minisom!
Many graph examples
example
Step by step with examples, calculations
Adds intuition regarding “magnetism”’
Implementation and faces
Medium on kohonen networks, i.e., SOM
Som on iris
Simple explanation
Algorithm, formulas
NEAT
A great article about NEAT
HyperNEAT
compositional pattern producing networks
https://en.wikipedia.org/wiki/Compositional_pattern-producing_network
A great HyperNeat tutorial on Medium.
RBF layer in Keras.
RBFN
BNN
Bayesian neural network (BNN)
Neural networks
MEDIUM with code how to do it.
Why do we need a confidence measure when we have a softmax probability layer?
Bernoulli distribution
Monte Carlo dropout
blog post
Phd Thesis by Yarin
The idea behind uncertainty is (
paper here
Medium post on prediction with drop out
solution for keras
an excellent and thorough explanation about LeNet
Illustrated 10 CNNS architectures
A study that deals with class imbalance in CNN’s
How to setup a conv1d in keras, most importantly how to reshape your input vector
Mastery on Character ngram cnn for sentiment analysis
Mastery
1. Using mask rnn for object detection
Making cnn shift invariance
Intuitions to the differences between max and average pooling:
GLOBAL MAX pooling
Hinton’s controversy thoughts on pooling
For improved performance
RESNET, DENSENET UNET
Explaination here, with some examples
The solution to CNN’s shortcomings
Understanding capsule nets - part 2,
Mastery
How to
The Unreasonable Effectiveness of Recurrent Neural Networks
Benchmarking RNN networks for text
Ref
Controversy regarding said work
first
incremental
RNNCELLS - recurrent shop
Source 1
source 2
Paper
The best, hands down, lstm post out there
what is?
Hidden state vs cell state
Illustrated rnn lstm gru
Paper
Paper
unrolling RNN’s introductory post
under/over fitting lstms
eturn_sequence and return_state in keras LSTM
understanding stateful vs stateless
stateful stateless for time series
timedistributed layer
wrapping cnn-lstm with time distributed
visual examples
Keras and LSTM
Very important - how to interpret LSTM neurons in keras
LSTM for time-series
A good description on what it is and how to use it.
ML mastery
Philippe remy
A good tutorial on LSTM:
stateful
HERE
Another explanation/tutorial about stateful lstm, should be thorough.
what is return_sequence, return_states
stacked LSTM
Keras Input shape
comparison
Benchmarking LSTM variants
paper
BIAS 1 in keras
Jozefowicz et al.
Validation_split arg
Return_sequence
Sequence.pad_sequences
Using batch size for LSTM in Keras
class_weight
here
example
number of units in LSTM
Calculate how many params are in an LSTM layer?
Understanding timedistributed in Keras
Another
Another simplified example
A great Slide about back prop, on a simple 3 neuron network, with very easy to understand calculations.
Paper
paper2
paper3
In keras
A tutorial about GRU
https://github.com/keisuke-nakata/rwa
https://arxiv.org/pdf/1703.01253.pdf
Potential competitor to the transformer
Why i am luke warm about GNN’s
Graphical intro to GNNs
Learning on graphs youtube - uriel singer
Benchmarking GNN’s, methodology, git, the works.
Awesome graph classification on github
A really good intro to graph networks, too long too summarize
Application of graph networks
Recommender systems using GNN
(how to find product relations, important: creating negative samples)
Transformers are GNN
Transformers are graphs, not the typical embedding on a graph, but a more holistic approach to understanding text as a graph.
Cnn for graphs
Staring with gnn
Basics deep walk and graphsage
Application of gnn
Central page for Graph deep learning articles on Medium
GAT graphi attention networks
Intro, basics, deep walk, graph sage
Struc2vec
youtube
machine learning with graphs by Stanford
Graph deep learning course
youtube
Git
Paper
Medium
W2v, deep walk, graph2vec, n2v
Git
Stanford
Elior on medium
youtube
Paper
medium
medium
Git
git
paper
Self clustering graph embeddings
Walklets
Fourier Transform
WAVELETS On youtube (4 videos)
used for denoising
reconstruct time and frequencies
githubcode
A causal framework for explaining the predictions of black-box sequence-to-sequence models
Siamese CNN, learns a similarity between images, not to classify
Visual tracking, explains contrastive and triplet loss
One shot learning, very thorough, baseline vs siamese
What is triplet loss
Google whitening black boxes using multi nets, segmentation and classification
Git
Optimus