📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • Basic nlp
  • Chunking
  • NLP for hackers tutorials
  • Synonyms
  • Swiss army knife libraries
  • Collocation
  • Language detection
  • Stemming
  • Phrase modelling
  • Document classification
  • Hebrew NLP tools
  • Semantic roles:

Was this helpful?

  1. Natural Language Processing

Foundation NLP

PreviousNLP ToolsNextName Matching

Last updated 2 years ago

Was this helpful?

Basic nlp

  1. d2v - tutorial with code and notebook.

    1. Logistic regression with word ngrams

    2. Logistic regression with character ngrams

    3. Logistic regression with word and character ngrams

    4. Recurrent neural network (bidirectional GRU) without pre-trained embeddings

    5. Recurrent neural network (bidirectional GRU) with GloVe pre-trained embeddings

    6. Multi channel Convolutional Neural Network

    7. RNN (Bidirectional GRU) + CNN model

  2. LexNLP -

Chunking

NLP for hackers tutorials

Synonyms

For a given word, using Vocabulary, you can get its

  • Meaning

  • Synonyms

  • Antonyms

  • Part of speech : whether the word is a noun, interjection or an adverb et el

  • Translate : Translate a phrase from a source language to the desired language.

  • Usage example : a quick example on how to use the word in a sentence

  • Pronunciation

  • Hyphenation : shows the particular stress points(if any)

Swiss army knife libraries

Collocation

Language detection

Stemming

How to measure a stemmer?

Phrase modelling

Phrase modeling is another approach to learning combinations of tokens that together represent meaningful multi-word concepts. We can develop phrase models by looping over the the words in our reviews and looking for words that co-occur (i.e., appear one after another) together much more frequently than you would expect them to by random chance. The formula our phrase models will use to determine whether two tokens AA and BB constitute a phrase is:

count(A B)−countmincount(A)∗count(B)∗N>threshold

Document classification

Hebrew NLP tools

Semantic roles:

using . Using nltk or stanford pos taggers, creating features from actual words (manual stemming, etc0 using the tags as labels, on a random forest, thus creating a classifier for POS on our own. Not entirely sure why we need to create a classifier from a “classifier”.

- POS, lemmatize, synon, antonym, hypernym, hyponym

- using synonyms cumsum for comparison. Today replaced with w2v mean sentence similarity.

- stemmers are faster, lemmatizers are POS / dictionary based, slower, converting to base form.

- shallow parsing, compared to deep, similar to NER

using nltk chunking as a labeller for a classifier, training one of our own. Using IOB features as well as others to create a new ner classifier which should be better than the original by using additional features. Aso uses a new english dataset GMB.

corpuses

Python Module to get Meanings, Synonyms and what not for a given word using vocabulary (also a comparison against word net)

is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spacy library. With the fundamentals — tokenization, part-of-speech tagging, dependency parsing, etc. — delegated to another library, textacy focuses on the tasks that come before and follow after.

What is collocation? - “the habitual juxtaposition of a particular word with another word or words with a frequency greater than chance.”Medium , quite good, comparing freq/t-test/pmi/chi2 with github code

A website dedicated to , methods, references, metrics.

a tutorial with chi2(IG?),

in R - has ideas on how to use collocations, for downstream tasks, LDA, W2V, etc. also explains about PMI and other metrics, note that gensim metric is unsupervised and probablistic.

NLTK on

A about keeping or removing stopwords for collocation, usefull but no firm conclusion. Imo we should remove it before

A with code of using nltk-based collocation

Small code for using nltk

Another code / score example for nltk

Jupyter notebook on - not useful

Paper: - We introduce ngrams into four representation methods. The experimental results demonstrate ngrams’ effectiveness for learning improved word representations. In addition, we find that the trained ngram embeddings are able to reflect their semantic meanings and syntactic patterns. To alleviate the costs brought by ngrams, we propose a novel way of building co-occurrence matrix, enabling the ngram-based models to run on cheap hardware

Youtube on , , mutual info and

- 55 languages af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he, hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl, pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw

References [ (apr11) (Index compression factor ICF) ]

- using gensim and spacy

last update 7y ago

, and other ,

,

(morphological analysis, normalization etc),

,

-

Benchmarking tokenizers for optimalprocessing speed
Using nltk with gensim
Multiclass text classification with svm/nb/mean w2v/
Basic pipeline for keyword extraction
DL for text classification
glorified regex extractor
Coding Chunkers as Taggers: IO, BIO, BMEWO, and BMEWO+
How to convert between verb/noun/adjective/adverb forms using Wordnet
Complete guide for training your own Part-Of-Speech Tagger -
Penn Treebank tagset
Word net introduction
Sentence similarity using wordnet
Stemmers vs lemmatizers
Chunking
NER -
Building nlp pipelines, functions coroutines etc..
Training ner using generators
Metrics, tp/fp/recall/precision/micro/weighted/macro f1
Tf-idf
Nltk for beginners
Nlp corpora
bow/bigrams
Textrank
Word cloud
Topic modelling using gensim, lsa, lsi, lda,hdp
Spacy full tutorial
POS using CRF
https://vocabulary.readthedocs.io/en/…
textacy
tutorial
collocations
Text analysis for sentiment, doing feature selection
part 2 with bi-gram collocation in ntlk
Text2vec
collocations
blog post
blog post
collocation
collocation
manually finding collocation
Ngram2Vec
Github
bigrams
collocation
collocation
Using google lang detect
1
2
3
4
5
Phrase Modeling
SO on PE.
Using hierarchical attention network
HebMorph
Hebmorph elastic search
Hebmorph blog post
blog posts
youtube
Awesome hebrew nlp git
git
Hebrew-nlp service
docs
the features
git
Apache solr stop words (dead)
SO on hebrew analyzer/stemming
here too
Neural sentiment benchmark using two algorithms, for character and word level lstm/gru
the paper
Hebrew word embeddings
Paper for rich morphological datasets for comparison - rivlin
http://language.worldofcomputing.net/semantics/semantic-roles.html