📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • Intro
  • VECTOR SIMILARITY SEARCH
  • TOOLS
  • FLAIR
  • HUGGING FACE
  • LANGUAGE EMBEDDINGS
  • History
  • Embedding Foundation Knowledge
  • Language modeling
  • Embedding spaces
  • Embedding Models
  • Cat2vec
  • ENTITY EMBEDDINGS
  • ALL2VEC EMBEDDINGS
  • WORD2VEC
  • FastText
  • SENTENCE EMBEDDING
  • PARAGRAPH2Vec
  • Doc2Vec

Was this helpful?

  1. Deep Learning

Embedding

PreviousDeep Neural FrameworksNextDeep Learning Models

Last updated 2 years ago

Was this helpful?

Intro

(amazing)

VECTOR SIMILARITY SEARCH

  1. - a library for efficient similarity search

  2. - complete with almost everything imaginable

  3. Elastic search -

  4. Google cloud vertex matching engine

    1. search

      1. Recommendation engines

      2. Search engines

      3. Ad targeting systems

      4. Image classification or image search

      5. Text classification

      6. Question answering

      7. Chat bots

    2. Features

      1. Low latency

      2. High recall

      3. managed

      4. Filtering

      5. scale

  5. Pinecone - managed - Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. No more hassles of benchmarking and tuning algorithms or building and maintaining infrastructure for vector search.

  6. ( - Benchmarks of approximate nearest neighbor libraries in Python) is a Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

  7. scann,

  8. - Make AI-driven decisions using your data, in real time. At any scale, with unbeatable performance

  9. - Weaviate is an vector search engine and vector database. Weaviate uses machine learning to vectorize and store data, and to find answers to natural language queries, or any other media type.

  10. - Indexing BERT vector data in Solr and searching with full traversal

  11. - This post goes much deeper -- to the similarity search algorithm on Apache Lucene level. It upgrades the code from 6.6 to 8.0

  12. - Neural Search in Elasticsearch: from vanilla to KNN to hardware acceleration

  13. - In the Ask Me Anything: Vector Search! session Max Irwin and Dmitry Kan discussed major topics of vector search, ranging from its areas of applicability to comparing it to good ol’ sparse search (TF-IDF/BM25), to its readiness for prime time and what specific engineering elements need further tuning before offering this to users.

  14. - GitHub repository used for experiments with Solr and Elasticsearch using DBPedia abstracts comparing Solr, vanilla Elasticsearch, elastiknn enhanced Elasticsearch, OpenSearch, and GSI APU

  15. - A detailed comparison of Milvus, Pinecone, Vespa, Weaviate, Vald, GSI and Qdrant

  16. - Podcast hosted by Dmitry Kan, interviewing the makers in the Vector / Neural Search industry. Available on YouTube, Spotify, Apple Podcasts and RSS

  17. -Video recording and slides of the talk presented on London IR Meetup on the topic of players, algorithms, software and use cases in Vector Search

  18. (paper)

TOOLS

FLAIR

  1. Name-Entity Recognition (NER): It can recognise whether a word represents a person, location or names in the text.

  2. Parts-of-Speech Tagging (PoS): Tags all the words in the given text as to which “part of speech” they belong to.

  3. Text Classification: Classifying text based on the criteria (labels)

  4. Training Custom Models: Making our own custom models.

  5. It comprises of popular and state-of-the-art word embeddings, such as GloVe, BERT, ELMo, Character Embeddings, etc. There are very easy to use thanks to the Flair API

  6. Flair’s interface allows us to combine different word embeddings and use them to embed documents. This in turn leads to a significant uptick in results

  7. ‘Flair Embedding’ is the signature embedding provided within the Flair library. It is powered by contextual string embeddings. We’ll understand this concept in detail in the next section

  8. Flair supports a number of languages – and is always looking to add new ones

HUGGING FACE

    1. how to make a custom pyTorch LSTM with custom activation functions,

    2. how the PackedSequence object works and is built,

    3. how to convert an attention layer from Keras to pyTorch,

    4. how to load your data in pyTorch: DataSets and smart Batching,

    5. how to reproduce Keras weights initialization in pyTorch.

LANGUAGE EMBEDDINGS

History

    1. Rnns are sequential, same word in diff position will have diff encoding due to the input from the previous word, which is inherently different.

    2. Attention without positional! Will have distinct (Same) encoding.

    3. Relative look at a window around each word and adds a distance vector in terms of how many words are before and after, which fixes the problem.

    4. The authors hypothesized that precise relative position information is not useful beyond a certain distance.

    5. Clipping the maximum distance enables the model to generalize to sequence lengths not seen during training.

Embedding Foundation Knowledge

    1. Baseline Averaged Sentence Embeddings

    2. Doc2Vec

    3. Neural-Net Language Models (Hands-on Demo!)

    4. Skip-Thought Vectors

    5. Quick-Thought Vectors

    6. InferSent

    7. Universal Sentence Encoder

Language modeling

  1. Scikit-learn inspired model finetuning for natural language processing.

Embedding spaces

Embedding Models

Cat2vec

ENTITY EMBEDDINGS

  1. Finally what they do is label encode each feature using labelEncoder into an int-based feature, then push each feature into its own embedding layer of size 1 with an embedding size defined by a rule of thumb (so it seems), merge all layers, train a synthetic regression/classification and grab the weights of the corresponding embedding layer.

ALL2VEC EMBEDDINGS

  1. emoji 2vec (below)

EMOJIS

    1. how to make a custom pyTorch LSTM with custom activation functions,

    2. how the PackedSequence object works and is built,

    3. how to convert an attention layer from Keras to pyTorch,

    4. how to load your data in pyTorch: DataSets and smart Batching,

    5. how to reproduce Keras weights initialization in pyTorch.

    1. Initialize embedding layers for each categorical input;

    2. For each category, compute dot-products among other embedding representations. These are our ‘groups’ at the categorical level;

    3. Summarize each ‘group’ adopting an average pooling;

    4. Concatenate ‘group’ averages;

    5. Apply regularization techniques such as BatchNormalization or Dropout;

    6. Output probabilities.

WORD2VEC

  1. Cleaning datasets using weighted w2v sentence encoding, then pca and isolation forest to remove outlier sentences.

  2. Mean w2v

  3. Sequential w2v embeddings.

GLOVE

FastText

SENTENCE EMBEDDING

Sense2vec

  1. >>> model.similarity('fair_game|NOUN', 'game|NOUN') 0.034977455677555599 >>> model.similarity('multiplayer_game|NOUN', 'game|NOUN') 0.54464530644393849

SENT2VEC aka “skip-thoughts”

USE - Universal sentence encoder

BERT+W2V

PARAGRAPH2Vec

Doc2Vec

A , fine tuning using hugging face transformers package.

Youtube , , , ,

(great!)

A

The Indian guy, notebook, ,

, , , ,

- Language modelling, the last approach mentioned, has been shown to capture many facets of language relevant for downstream tasks, such as , , and . Compared to related unsupervised tasks such as skip-thoughts and autoencoding, .

A about w2v skipthought - with code!, specifically language modelling here is important - Our second method is training a language model to represent our sentences. A language model describes the probability of a text existing in a language. For example, the sentence “I like eating bananas” would be more probable than “I like eating convolutions.” We train a language model by slicing windows of n words and predicting what the next word will be in the text

ELMO -

**[python git]()- We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks.**

- We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: and . , .

ships with a pre-trained language model from and builds off the .

Did not read - - jupyter on transformer with annotation

Medium on ,

- w2v ft s2v skip, d2v

- sentence embedding is defined as the average of the source word embeddings of its constituent words. This model is furthermore augmented by also learning source embeddings for not only unigrams but also n-grams of words present in each sentence, and averaging the n-gram embeddings along with the words

- the better word similarity algorithm

- Doc2vec (aka paragraph2vec, aka sentence embeddings) modifies the word2vec algorithm to unsupervised learning of continuous representations for larger blocks of text, such as sentences, paragraphs or entire documents. - Most importantly this tutorial has crucial information about the implementation parameters that should be read before using it.

, , is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embedded label, document and word vectors and returns documents of categories modeled by manually predefined keywords.

**[git]()- Where word2vec attempts to predict surrounding words from certain words in a sentence, skip-thought vector extends this idea to sentences: it predicts surrounding sentences from a given sentence. NOTE: Unlike the other methods, skip-thought vectors require the sentences to be ordered in a semantically meaningful way. This makes this method difficult to use for domains such as social media text, where each snippet of text exists in isolation.**

- Skip-thought vectors are slow to train. FastSent attempts to remedy this inefficiency while expanding on the core idea of skip-thought: that predicting surrounding sentences is a powerful way to obtain distributed representations. Formally, FastSent represents sentences as the simple sum of its word embeddings, making training efficient. The word embeddings are learned so that the inner product between the sentence embedding and the word embeddings of surrounding sentences is maximized. NOTE: FastSent sacrifices word order for the sake of efficiency, which can be a large disadvantage depending on the use-case.

Weighted sum of words - In this method, each word vector is weighted by the factor where is a hyperparameter and is the (estimated) word frequency. This is similar to tf-idf weighting, where more frequent terms are weighted downNOTE: Word order and surrounding sentences are ignored as well, limiting the information that is encoded.

- InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language inference data and generalizes well to many different tasks. ABSTRACT: we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks.

- , The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks. The model is trained and optimized for greater-than-word length text, such as sentences, phrases or short paragraphs. It is trained on a variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding tasks. The input is variable length English text and the output is a 512 dimensional vector. We apply this model to the for semantic similarity, and the results can be seen in the made available. The universal-sentence-encoder model is trained with a deep averaging network (DAN) encoder.

- no hebrew

Pair2vec - - paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships. I.e., using p2v information with existing models to increase performance. Experiments show that our pair embeddings can complement individual word embeddings, and that they are perhaps capturing information that eludes the traditional interpretation of the Distributional Hypothesis

Part1:

, and entity embeddings for categorical data

Star -

, using fastai without limiting ourselves to pytorch - the material from this post is covered in much more detail starting around 1:59:45 in and continuing in of our free, online course. To see example code of how this approach can be used in practice, check out our . Perhaps Saturday and Sunday have similar behavior, and maybe Friday behaves like an average of a weekend and a weekday. Similarly, for zip codes, there may be patterns for zip codes that are geographically near each other, and for zip codes that are of similar socio-economic status. The jupyter notebook doesn't seem to have the embedding example they are talking about.

, used entity-embeddings, , ,

- git code for a simplified entity embedding above.

Fast.ai regarding embedding for tabular data, i.e., cont and categorical data

categorical data +

+

- code in theano, .

of tweet2vec,

Paper:

Diff2vec - might be useful on social network graphs, ,

**[Git](), similarity measure for words with types. [ **]()

1. ,

for emoji, sentiment, sarcasm,

- medium article with keras code, a

git and , which is a multi input embedding network using a-f below. plus two other methods that involve groupby and applying entropy and join/countvec per class. Really interesting

Monitor using callbacks for word2vec

, for correspondence analysis,

**[post #2]() - negative sampling “Negative sampling addresses this by having each training sample only modify a small percentage of the weights, rather than all of them. With negative sampling, we are instead going to randomly select just a small number of “negative” words (let’s say 5) to update the weights for. (In this context, a “negative” word is one for which we want the network to output a 0 for). We will also still update the weights for our “positive” word (which is the word “quick” in our current example). The “negative samples” (that is, the 5 output words that we’ll train to output 0) are chosen using a “unigram distribution”. Essentially, the probability for selecting a word as a negative sample is related to its frequency, with more frequent words being more likely to be selected as negative samples.**

training, i.e., huffman binary tree for the vocabulary, learning internal tree nodes ie.,, the path as the probability vector instead of having len(vocabulary) neurons.

Another , with starter code and some usage examples of similarity

yet but looks promising

form x:y:b, plus code, plus insights of why it works and doesn't. presence : absence :: happy : unhappy absence : presence :: happy : proud abundant : scarce :: happy : glad refuse : accept :: happy : satisfied accurate : inaccurate :: happy : disappointed admit : deny :: happy : delighted never : always :: happy : Said_Hirschbeck modern : ancient :: happy : ecstatic

comparison - glove wins in % and time.

- “GloVe takes a different approach. Instead of extracting the embeddings from a neural network that is designed to perform a surrogate task (predicting neighbouring words), the embeddings are optimized directly so that the dot product of two word vectors equals the log of the number of times the two words will occur near each other (within 5 words for example). For example if "dog" and "cat" occur near each other 10 times in a corpus, then vec(dog) dot vec(cat) = log(10). This forces the vectors to somehow encode the frequency distribution of which words occur near them.”

**

, data cleaning and word similarity

Gensim - , similarity, analogies

- promises speed and out of the box support for many embeddings.

- “Word2Vec embeddings seem to be slightly better than fastText embeddings at the semantic tasks, while the fastText embeddings do significantly better on the syntactic analogies. Makes sense, since fastText embeddings are trained for understanding morphological nuances, and most of the syntactic analogies are morphology based.

means syntax, as in tasks that have to do with the structure of the sentence, these include tree parsing, POS tagging, usually they need less context and a shallower understanding of world knowledge

tasks mean meaning related, a higher level of the language tree, these also typically involve a higher level understanding of the text and might involve tasks s.a. question answering, sentiment analysis, etc...

As for analogies, he is referring to the mathematical operator like properties exhibited by word embedding, in this context a syntactic analogy would be related to plurals, tense or gender, those sort of things, and semantic analogy would be word meaning relationships s.a. man + queen = king, etc... See for instance (and many others)

on fasttext vs glove vs w2v on a single DS, performance comparison. Ft wins by a small margin

Medium on universal with code.

, : Using spacy or not, with w2v using POS/ENTITY TAGS to find similarities.based on reddit. “We follow Trask et al in adding part-of-speech tags and named entity labels to the tokens. Additionally, we merge named entities and base noun phrases into single tokens, so that they receive a single vector.”

- usage examples, parallel training, a detailed comparison against gensim doc2vec

epoch in d2v in order to fight overfitting

embeddings from the ground up singlelunch
Faiss
Benchmarking
Singlestore
dense vector
NN search
vector similarity search
Nmslib
benchmarked
Vespa.ai
Weaviate
open source
Neural Search with BERT and Solr
Fun With Apache Lucene and BERT Embeddings
Speeding up BERT Search in Elasticsearch
Ask Me Anything about Vector Search
Search with BERT vectors in Solr and Elasticsearch
Not All Vector Databases Are Made Equal
Vector Podcast
Players in Vector Search: Video
Hybrid retrieval using search and semantic search
Git
Hugging face pytorch transformers
Hugging face nlp pretrained
hugging face on emotions
thorough tutorial on bert
Code
ep1
2
3
3b
Google’s intro to transformers and multi-head self attention
How self attention and relative positioning work
From bert to albert
All the latest buzz algos
Summary of them
8 pretrained language embeddings
Hugging face pytorch transformers
Hugging face nlp pretrained
Medium on Introduction into word embeddings, sentence embeddings, trends in the field.
git
his git
Shay palachy on word embedding covering everything from bow to word/doc/sent/phrase.
Another intro, not as good as the one above
Using sklearn vectorizer to create custom ones, i.e. a vectorizer that does preprocessing and tfidf and other things.
TFIDF - n-gram based top weighted tfidf words
Gensim bi-gram phraser/phrases analyser/converter
Countvectorizer, stemmer, lemmatization code tutorial
Current 2018 best universal word and sentence embeddings -> elmo
5-part series on word embeddings
part 2
3
4 - cross lingual review
5-future trends
Word embedding posts
Facebook github for embedings called starspace
Medium on Fast text / elmo etc
Ruder on language modelling as the next imagenet
long-term dependencies
hierarchical relations
sentiment
language modelling performs better on syntactic tasks even with less training data
tutorial
Unread - universal language model fine tuning for text-classification
medium
Open.ai on language modelling
transformers
unsupervised pre-training
READ PAPER
VIEW CODE
finetune
“Improving Language Understanding by Generative Pre-Training”
OpenAI/finetune-language-model repository
The annotated Transformer
Dissecting Bert
appendix
Medium on distilling 6 patterns from bert
A good overview of sentence embedding methods
A very good overview of word embeddings
Intro to word embeddings - lots of images
A very long and extensive thesis about embeddings
Sent2vec by gensim
Sent2vec vs fasttext - with info about s2v parameters
Wordrank vs fasttext vs w2v comparison
W2v vs glove vs sppmi vs svd by gensim
Medium on a gentle intro to d2v
Doc2vec tutorial by gensim
Lbl2Vec
medium
Git for word embeddings - taken from mastery’s nlp course
Skip-thought -
https://github.com/ryankiros/skip-thoughts
Fastsent
Infersent by facebook
paper
Universal sentence encoder - google
notebook
git
STS benchmark
example notebook
Multi language universal sentence encoder
paper
Fast text python tutorial
Label encoder/ ordinal, One hot, one hot with a rare bucket, hash
Part2: cat2vec using w2v
General purpose embedding paper with code somewhere
Using embeddings on tabular data, specifically categorical - introduction
the Lesson 3 video
Lesson 4
Practical Deep Learning for Coders
Lesson 3 jupyter notebook
Rossman on kaggle
here
github
paper
Medium on rossman - good
Embedder
Entity2vec
Categorical using keras
ALL ???-2-VEC ideas
post
Entity embedding for
notebook
Kaggle taxi competition + code
Ross man competition - entity embeddings, code missing
alternative code
CODE TO CREATE EMBEDDINGS straight away, based onthe ideas by cheng guo in keras
PIN2VEC - pinterest embeddings using the same idea
Tweet2Vec
paper
Clustering
paper
Character neural embeddings for tweet clustering
paper
code
Char2vec
https://github.com/IntuitionEngineeringTeam/chars2vec
https://arxiv.org/abs/1708.00524
Deepmoji
hugging face on emotions
Another great emoji paper, how to get vector representations from
3. What can we learn from emojis (deep moji)
Learning millions of
medium
EMOJI2VEC
nother paper on classifying tweets using emojis
Group2vec
medium
train loss
Removing ‘gender bias using pair mean pca
KPCA w2v approach on a very small dataset
similar git
paper
The best w2v/tfidf/bow/ embeddings post ever
Chris mccormick ml on w2v,
http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/
Chris mccormick on negative sampling and hierarchical soft max
Great W2V tutorial
gensim-based w2v tutorial
Clustering using gensim word2vec
Yet another w2v medium explanation
Negative sampling, why does it work in w2v - didnt read
Semantic contract using w2v/ft - he chose a good food category and selected words that worked best in order to find similar words to good bad etc. lior magen
Semantic contract, syn-antonym DS, using w2v, a paper that i havent read
Amazing w2v most similar tutorial, examples for vectors, misspellings, semantic contrast and relations that may or may not be captured in the network.
Followup tutorial about genderfying words using ‘he’ ‘she’ similarity
🅰️
W2v Analogies using predefined anthologies of the
Nlpforhackers on bow, w2v embeddings with code on how to use
Hebrew word embeddings with w2v, ron shemesh, on wiki/twitter
W2v vs glove vs fasttext, in terms of overfitting and what is the idea behind
W2v against glove performance
How glove and w2v work, but the following has a very good description
Glove vs w2v, concise explanation
Fasttext - using fast text and upsampling/oversapmling on twitter data
A great youtube lecture 9m about ft, rarity, loss, class tree speedup
A thorough tutorial about what is FT and how to use it, performance, pros and cons.
Docs
Medium: word embeddings with w2v and fast text in gensim
fasttext docs
Alternative to gensim
Comparison of usage w2v fasttext
Using gensim fast text - recommendation against using the fb version
A comparison of w2v vs ft using gensim
Syntactic
Semantic
this article
Skip gram vs CBOW
Paper
Medium on w2v/fast text ‘most similar’ words with code
keras/tf code for a fast text implementation
Medium on fast text and imbalance data
Sentence encoder, w2v, Fast text for sentiment
Blog
github
Gensim implementation of sent2vec
Git implementation
Another git - worked
Git notebook, usage and sentence similarity benchmark / visualization
Sentence similarity
Paragraph2VEC by stanford
Shuffle before training each
Bert
https://github.com/CyberZHG/keras-bert
\frac{a}{a + p(w)}
a
p(w)