Data Science Tools

Python

Async io

Clean code:

Virtual Environments

PYENV

  1. pyenv virtualenv

JUPYTER

(how does reshape work?)arrow-up-right - a shape of (2,4,6) is like a tree of 2->4 and each one has more leaves 4->6.

As far as i can tell, reshape effectively flattens the tree and divide it again to a new tree, but the total amount of inputs needs to stay the same. 2*4*6 = 4*2*3*2 for example

code: import numpy rng = numpy.random.RandomState(234) a = rng.randn(2,3,10) print(a.shape) print(a) b = numpy.reshape(a, (3,5,-1)) print(b.shape) print (b)

*** A tutorial for Google Colaboratory - free Tesla K80 with Jup-notebookarrow-up-right

Jupyter on Amazon AWSarrow-up-right

How to add extensions to jupyter: extensionsarrow-up-right

Connecting from COLAB to MS AZUREarrow-up-right

Streamlit vs. Dash vs. Shiny vs. Voila vs. Flask vs. Jupyterarrow-up-right

SCIPY

  1. Minima / maximaarrow-up-right finding it in a 1d numpy array

NUMPY

Using numpy efficientlyarrow-up-right - explaining why vectors work faster. Fast vector calculation, a benchmarkarrow-up-right between list, map, vectorize. Vectorize wins. The idea is to use vectorize and a function that does something that may involve if conditions on a vector, and do it as fast as possible.

PANDAS

  1. Great introductory tutorialarrow-up-right about using pandas, loading, loading from zip, seeing the table’s features, accessing rows & columns, boolean operations, calculating on a whole row\column with a simple function and on two columns even, dealing with time\date parsing.

  2. def mask_with_values(df): mask = df['A'].values == 'foo' return df[mask]

  3. Accessing dataframe rows, columns and cellsarrow-up-right- by name, by index, by python methods.

  4. Dealing with time seriesarrow-up-right in pandas,

    1. Create a new columnarrow-up-right based on a (boolean or not) column and calculation:

    2. Using python (map)

    3. Using numpy

    4. using a function (not as pretty)

  5. Given a DataFrame, the shiftarrow-up-right() function can be used to create copies of columns that are pushed forward (rows of NaN values added to the front) or pulled back (rows of NaN values added to the end).

    1. df['t'] = [x for x in range(10)]

    2. df['t-1'] = df['t'].shift(1)

    3. df['t-1'] = df['t'].shift(-1)

  6. Dataframe Validation In Pythonarrow-up-right - A Practical Introduction - Yotam Perkal - PyCon Israel 2018

  7. In this talk, I will present the problem and give a practical overview (accompanied by Jupyter Notebook code examples) of three libraries that aim to address it: Voluptuous - Which uses Schema definitions in order to validate data [https://github.com/alecthomas/voluptuousarrow-up-right] Engarde - A lightweight way to explicitly state your assumptions about the data and check that they're actually true [https://github.com/TomAugspurger/engardearrow-up-right] * TDDA - Test Driven Data Analysis [ https://github.com/tdda/tddaarrow-up-right]. By the end of this talk, you will understand the Importance of data validation and get a sense of how to integrate data validation principles as part of the ML pipeline.

Exploratory Data Analysis (EDA)

  1. Sweetvizarrow-up-right - "Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application.

    The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks."

TIMESERIES

  1. SCI-KIT LEARN

  2. cuMLarrow-up-right - Multi gpu, multi node-gpu alternative for SKLEARN algorithms

  3. Awesome code examplesarrow-up-right about using svm\knn\naive\log regression in sklearn in python, i.e., “fitting a model onto the data”

Also Insanely fast, see herearrow-up-right.

  1. Functional api for sk learnarrow-up-right, using pipelines. thank you sk-lego.

FAST.AI

  1. Mediumarrow-up-right on all fast.ai courses, 14 posts

PYCARET

1. What is? by vidhayaarrow-up-right - PyCaretarrow-up-right is an open-source, machine learning library in Python that helps you from data preparation to model deployment. It is easy to use and you can do almost every data science project task with just one line of code.

NVIDIA TF CUDA CUDNN

GCP

Resize google disk sizearrow-up-right, 1,arrow-up-right **[2](https://www.cloudbooklet.com/how-to-resize-disk-of-a-vm-instance-in-google-cloud/arrow-up-right),**

GIT / Bitbucket

Last updated

Was this helpful?