📒
Machine & Deep Learning Compendium
  • The Machine & Deep Learning Compendium
    • Thanks Page
  • The Ops Compendium
  • Types Of Machine Learning
    • Overview
    • Model Families
    • Weakly Supervised
    • Semi Supervised
    • Active Learning
    • Online Learning
    • N-Shot Learning
    • Unlearning
  • Foundation Knowledge
    • Data Science
    • Data Science Tools
    • Management
    • Project & Program Management
    • Data Science Management
    • Calculus
    • Probability & Statistics
    • Probability
    • Hypothesis Testing
    • Feature Types
    • Multi Label Classification
    • Distribution
    • Distribution Transformation
    • Normalization & Scaling
    • Regularization
    • Information Theory
    • Game Theory
    • Multi CPU Processing
    • Benchmarking
  • Validation & Evaluation
    • Features
    • Evaluation Metrics
    • Datasets
    • Dataset Confidence
    • Hyper Parameter Optimization
    • Training Strategies
    • Calibration
    • Datasets Reliability & Correctness
    • Data & Model Tests
    • Fairness, Accountability, and Transparency
    • Interpretable & Explainable AI (XAI)
    • Federated Learning
  • Machine Learning
    • Algorithms 101
    • Meta Learning (AutoML)
    • Probabilistic, Regression
    • Data Mining
    • Process Mining
    • Label Algorithms
    • Clustering Algorithms
    • Anomaly Detection
    • Decision Trees
    • Active Learning Algorithms
    • Linear Separator Algorithms
    • Regression
    • Ensembles
    • Reinforcement Learning
    • Incremental Learning
    • Dimensionality Reduction Methods
    • Genetic Algorithms & Genetic Programming
    • Learning Classifier Systems
    • Recommender Systems
    • Timeseries
    • Fourier Transform
    • Digital Signal Processing (DSP)
    • Propensity Score Matching
    • Diffusion models
  • Classical Graph Models
    • Graph Theory
    • Social Network Analysis
  • Deep Learning
    • Deep Neural Nets Basics
    • Deep Neural Frameworks
    • Embedding
    • Deep Learning Models
    • Deep Network Optimization
    • Attention
    • Deep Neural Machine Vision
    • Deep Neural Tabular
    • Deep Neural Time Series
  • Audio
    • Basics
    • Terminology
    • Feature Engineering
    • Deep Neural Audio
    • Algorithms
  • Natural Language Processing
    • A Reality Check
    • NLP Tools
    • Foundation NLP
    • Name Matching
    • String Matching
    • TF-IDF
    • Language Detection Identification Generation (NLD, NLI, NLG)
    • Topics Modeling
    • Named Entity Recognition (NER)
    • SEARCH
    • Neural NLP
    • Tokenization
    • Decoding Algorithms For NLP
    • Multi Language
    • Augmentation
    • Knowledge Graphs
    • Annotation & Disagreement
    • Sentiment Analysis
    • Question Answering
    • Summarization
    • Chat Bots
    • Conversation
  • Generative AI
    • Methods
    • Gen AI Industry
    • Speech
    • Prompt
    • Fairness, Accountability, and Transparency In Prompts
    • Large Language Models (LLMs)
    • Vision
    • GPT
    • Mix N Match
    • Diffusion Models
    • GenAI Applications
    • Agents
    • RAG
    • Chat UI/UX
  • Experimental Design
    • Design Of Experiments
    • DOE Tools
    • A/B Testing
    • Multi Armed Bandits
    • Contextual Bandits
    • Factorial Design
  • Business Domains
    • Follow the regularized leader
    • Growth
    • Root Cause Effects (RCE/RCA)
    • Log Parsing / Templatization
    • Fraud Detection
    • Life Time Value (LTV)
    • Survival Analysis
    • Propaganda Detection
    • NYC TAXI
    • Drug Discovery
    • Intent Recognition
    • Churn Prediction
    • Electronic Network Frequency Analysis
    • Marketing
  • Product Management
    • Expanding Your Data Science Skills
    • Product Vision & Strategy
    • Product / Program Managers
    • Product Management Resources
    • Product Tools
    • User Experience Design (UX)
    • Business
    • Marketing
    • Ideation
  • MLOps (www.OpsCompendium.com)
  • DataOps (www.OpsCompendium.com)
  • Humor
Powered by GitBook
On this page
  • Python
  • Virtual Environments
  • JUPYTER
  • SCIPY
  • NUMPY
  • PANDAS
  • Exploratory Data Analysis (EDA)
  • TIMESERIES
  • FAST.AI
  • PYCARET
  • NVIDIA TF CUDA CUDNN
  • GCP
  • GIT / Bitbucket

Was this helpful?

  1. Foundation Knowledge

Data Science Tools

PreviousData ScienceNextManagement

Last updated 3 years ago

Was this helpful?

Python

  1. good for removal

  2. Coroutines

Async io

Clean code:

Virtual Environments

PYENV

  1. pyenv virtualenv

JUPYTER

  • Jupyter notebooks as a module

    1. Enter your project directory

    2. $ python -m venv projectname

    3. $ source projectname/bin/activate

    4. (venv) $ pip install ipykernel

    5. (venv) $ ipython kernel install --user --name=projectname

    6. Run jupyter notebook * (not entirely sure how this works out when you have multiple notebook processes, can we just reuse the same server?)

    7. Connect to the new server at port 8889

As far as i can tell, reshape effectively flattens the tree and divide it again to a new tree, but the total amount of inputs needs to stay the same. 2*4*6 = 4*2*3*2 for example

code: import numpy rng = numpy.random.RandomState(234) a = rng.randn(2,3,10) print(a.shape) print(a) b = numpy.reshape(a, (3,5,-1)) print(b.shape) print (b)

SCIPY

NUMPY

PANDAS

  1. def mask_with_values(df): mask = df['A'].values == 'foo' return df[mask]

    1. Using python (map)

    2. Using numpy

    3. using a function (not as pretty)

    1. df['t'] = [x for x in range(10)]

    2. df['t-1'] = df['t'].shift(1)

    3. df['t-1'] = df['t'].shift(-1)

Exploratory Data Analysis (EDA)

  1. The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks."

TIMESERIES

  1. SCI-KIT LEARN

FAST.AI

PYCARET

NVIDIA TF CUDA CUDNN

GCP

GIT / Bitbucket

complementary to the above

- a make sense tutorial and instructions on how to use all.

by alfredo motta

by Christine Egan

**

Important

( - put a one liner before the code and query the variables inside a function.

, on

( - a shape of (2,4,6) is like a tree of 2->4 and each one has more leaves 4->6.

*** A tutorial for

How to add extensions to jupyter:

to finding the minima

finding it in a 1d numpy array

- explaining why vectors work faster. between list, map, vectorize. Vectorize wins. The idea is to use vectorize and a function that does something that may involve if conditions on a vector, and do it as fast as possible.

about using pandas, loading, loading from zip, seeing the table’s features, accessing rows & columns, boolean operations, calculating on a whole row\column with a simple function and on two columns even, dealing with time\date parsing.

- pivot melt stack unstack

(benchmarked):

- by name, by index, by python methods.

-

in pandas,

based on a (boolean or not) column and calculation:

Given a DataFrame, the () function can be used to create copies of columns that are pushed forward (rows of NaN values added to the front) or pulled back (rows of NaN values added to the end).

- A Practical Introduction - Yotam Perkal - PyCon Israel 2018

In this talk, I will present the problem and give a practical overview (accompanied by Jupyter Notebook code examples) of three libraries that aim to address it: Voluptuous - Which uses Schema definitions in order to validate data [] Engarde - A lightweight way to explicitly state your assumptions about the data and check that they're actually true [] * TDDA - Test Driven Data Analysis [ ]. By the end of this talk, you will understand the Importance of data validation and get a sense of how to integrate data validation principles as part of the ML pipeline.

, use apply.

- "Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.

(good)

Pipeline t,

- Multi gpu, multi node-gpu alternative for SKLEARN algorithms

about using svm\knn\naive\log regression in sklearn in python, i.e., “fitting a model onto the data”

. , , .

Also Insanely fast, .

, using pipelines. thank you sk-lego.

Images by

on all fast.ai courses, 14 posts

- is an open-source, machine learning library in Python that helps you from data preparation to model deployment. It is easy to use and you can do almost every data science project task with just one line of code.

,

**

to initialize NVML: Driver/library version mismatch

, **[2](),**

(great)

by

How to use better OOP in python.
Best practices programming python classes - a great lecture.
How to know pip packages size’
Python type checking tutorial
Import click - command line interface
Concurrency vs Parallelism (great)
Async in python
Coroutines vs futures
generators async wait
Intro to concurrent,futures
Future task event loop
Intro
complete
Clean code in python git
About the book
stack overflow on pyenv / venv / etc
Guide to pyenv & pyenv virtualenv
Managing virtual env with pyenv
Just use venv
Summary on all the *envs
A really good primer on virtual environments
Introduction to venv
Pipenv
A great intro to pipenv
A complementary to pipenv above
Comparison between all *env
pyenv, virtualenv and using them with Jupyter
Create isolated Jupyter ipython kernels with pyenv and virtualenv
Jupyter Notebook in a virtual env
Installing pyenv
Intro to pyenv
Pyenv tutorial and finding where it is
Pyenv override system python on mac
Cloud GPUS cheap
Importing a notebook as a module
colaboratory commands for jupytr
Timing and profiling in Jupyter
Debugging in Jupyter, how?)
28 tips n tricks for jupyter
Nbdev
fast.ai
jupytext
Virtual environments in jupyter
Virtual env with jupyter
how does reshape work?)
Google Colaboratory - free Tesla K80 with Jup-notebook
Jupyter on Amazon AWS
extensions
Connecting from COLAB to MS AZURE
Streamlit vs. Dash vs. Shiny vs. Voila vs. Flask vs. Jupyter
Optimization problems, a nice tutorial
Minima / maxima
Using numpy efficiently
Fast vector calculation, a benchmark
Great introductory tutorial
Visualizing pandas pivoting and reshaping functions by Jay Alammar
How to beautify pandas dataframe using html display
Speeding up pandas
The fastest way to select rows by columns, by using masked values
Parallelism, pools, threads, dask
Accessing dataframe rows, columns and cells
Looping through pandas
How to inject headers into a headless CSV file
Dealing with time series
Create a new column
shift
Row and column sum in pandas and numpy
Dataframe Validation In Python
https://github.com/alecthomas/voluptuous
https://github.com/TomAugspurger/engarde
https://github.com/tdda/tdda
Stop using itterows
(great) Group and Aggregate by One or More Columns in Pandas
Pandas Groupby: Summarising, Aggregating, and Grouping data in Python
pandas function you didnt know about
json_normalize()
Pandas summary
Pandas html profiling
Sweetviz
Pandas time series manipulation
Using resample
Basic TS manipulation
Fill missing ts gaps, or how to resample
o json 1
2
cuML
Gpu TSNE ^
Awesome code examples
Parallelism of numpy, pandas and sklearn using dask and clusters
Webpage
docs
example in jupyter
see here
Functional api for sk learn
SK-Lego
Medium
1. What is? by vidhaya
PyCaret
Install TF
Install cuda on ubuntu
official linux
Replace cuda version
Cuda 9 download
Install cudnn
Installing everything easily
Failed
Resize google disk size
1,
https://www.cloudbooklet.com/how-to-resize-disk-of-a-vm-instance-in-google-cloud/
understanding git
pre-commit
Rewrite git history, all the commands
Installing git LFS
Use git lfs
Download git-lfs
Git wip
Carolyn Van Slyck
by Jeremy Chow
by
Sweetviz