📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • LIFE CYCLE
  • WORKFLOWS
  • PLATFORMS
  • STACK
  • Being a DS / Researcher
  • Team Building / Group Cohesion
  • Culture
  • Agile for data-science-research
  • SOTA AND CURRENT TRENDS SUMMARIES
  • Building Data/DS teams
  • YOUTUBE COURSES
  • Deep learning Course
  • Machine Learning Courses
  • NLP Courses
  • Predictive Analytics Course
  • BOOKS & NOTEBOOKS
  • COST
  • Patents
  • General Advice

Was this helpful?

  1. Foundation Knowledge

Data Science

PreviousUnlearningNextData Science Tools

Last updated 3 years ago

Was this helpful?

LIFE CYCLE

- "The Team Data Science Process (TDSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. TDSP helps improve team collaboration and learning by suggesting how team roles work best together. TDSP includes best practices and structures from Microsoft and other industry leaders to help toward successful implementation of data science initiatives. The goal is to help companies fully realize the benefits of their analytics program.

This article provides an overview of TDSP and its main components. We provide a generic description of the process here that can be implemented with different kinds of tools. A more detailed description of the project tasks and roles involved in the lifecycle of the process is provided in additional linked topics. Guidance on how to implement the TDSP using a specific set of Microsoft tools and infrastructure that we use to implement the TDSP in our teams is also provided."

by

"When I used to do consulting, I’d always seek to understand an organization’s context for developing data projects, based on these considerations:

  • Strategy: What is the organization trying to do (objective) and what can it change to do it better (levers)?

  • Data: Is the organization capturing necessary data and making it available?

  • Analytics: What kinds of insights would be useful to the organization?

  • Implementation: What organizational capabilities does it have?

  • Maintenance: What systems are in place to track changes in the operational environment?

  • Constraints: What constraints need to be considered in each of the above areas?"

WORKFLOWS

PLATFORMS

STACK

Being a DS / Researcher

Team Building / Group Cohesion

References:

Culture

Agile for data-science-research

SOTA AND CURRENT TRENDS SUMMARIES

Building Data/DS teams

YOUTUBE COURSES

Deep learning Course

Machine Learning Courses

NLP Courses

Predictive Analytics Course

BOOKS & NOTEBOOKS

COST

Patents

General Advice

ML systems is more than ML code. .
ML systems is more than ML code. .

, business kpi are not research kpi, etc

Full stack DS

by . wrong credits? .

- the most intensive diagram post ever. This is the motherload of figure references.

, , , , , , , , , 10

(good advice)

culture

- "netflixs-keeper-test-is-the-secret-to-a-successful-workforce"

by Aviran Mordo

- linear algebra

- Histograms for (Image distribution - mean distribution) / std dev, are looking quite good.

#2

101

, , gensim notebooks

,

lena

, notebooks!,

Moving an ML model to production is much easier if you keep inputs, features, and transforms separate

Saving the intermediate weights of your model during training provides resilience, generalization, and tunability

Base machine learning model training and evaluation on total number of examples, not on epochs or steps

Export your model so that it passes through client keys

use the hash of a well distributed column to split your data into training, validation, and testing

- from w2v, doc2vec to nmf, lda, pca, sklearn api, cosine, topic modeling, tsne, etc.

- francois chollet, deep learning & vision , .

Yandex school,

(i.e., data science)

(really good) - distributions, outliers, examples, slices, metric significance, consistency over time, validation, description, evaluation, robustness in measurement, reproducibility, etc.

Google’s famous MLops
Fast ai project checklist
kaggle
Uber, google, netflix, airbnb, etc
Medium on canonical stack
A day in a life
Advice for a ds
Review of deep learning papers and co authorship
Uri Weiss
Uri Weiss
please contact me
ML practices for a DS
DS vs DA vs MLE
1
2
3
4
5
6
7
8
9
Why data science needs generalists not specialists
Building a DS function (team)
Netflix
Reed hastings on netflix' keeper test
response 1
How to manage a data science research team using agile methodology, not scrum and not kanban
Workflow for data science research projects
Tips for data science research management
IMO a really bad implementation of agile for data-science-projects
ICLR 2019
Medium
State of ai, a yearly report
(great) the data team a short story by erik bern
Guilds / Gangs / Squads
Squads, Tribes, Guilds, dont be like Spotify
Discover the Spotify Model
DEEPNET.TV YOUTUBE (excellent)
Mitchel ML Lectures (too long)
Quoc Les (google) wrote DNN tutorials and 3H video (not intuitive)
KDnuggets: numpy, panda, scikit, tutorials.
Deep learning online book (too wordy)
Genetic Algorithms - grid search hyper params better than brute force.. obviously
CNN tutorial
Introduction to programming in scikit
SVM in scikit python
Sklearn scipy PCA tutorial
RNN
Matrix Multiplication
Kadenze - deep learning tensor flow
deep learning with keras
Recommended: Udacity includes ML and DL
Week1: Introduction Lesson 4: Supervised, unsupervised.
Lesson 6: model regression, cost function
Lesson 71: optimization objective, large margin classification
PCA at coursera #1
PCA at coursera
PCA #3
SVM at coursera #1 - simplified
spacy
gensim
2
nltk
2
yandex
voita
Syllabus
Week 2: Lesson 29: supervised learning
Lesson 36: From rules to trees
Lesson 43: overfitting, then validation, then accuracy
Lesson 46: bootstrap, bagging, boosting, random forests.
Lesson 52: NN
Lesson 55: Gradient Descent
Lesson 59: Logistic regression, SVM, Regularization, Lasso, Ridge regression
Lesson 64: gradient descent, stochastic, parallel, batch.
Unsupervised: Lesson X K-means, DBscan
Machine learning design patterns
git
medium
DP1 - transform
DP2 - checkpoints
DP3 - virtual epochs
DP4 - keyed predictions
DP5 - repeatable sampling
Gensim notebooks
Deep learning with python
git notebooks!
official notebooks
nlp notebooks
Machine learning engineering book
Interpretable Machine Learning book
GPT2/3
Method Patent Exceptionalism
Practical advice for analysis of large, complex data sets
Microsoft on Team DS Lifecycle
The DS lifecycle, Microsoft Documentation
Google
Google
by
The DS lifecycle, Microsoft Documentation