Machine Learning

Keith Dillon
Fall 2018


Topic 2: Python Tools - Jupyter, Numpy/SciPy, SkLearn (and Matplotlib)

Outline: Leading Python tools:

  1. Jupyter - "notebooks" for inline code + LaTex math + markup, etc.

  2. NumPy - low-level array & matrix handling and algorithms

  3. SciPy - higher level numerical algorithms (still fairly basic)

  4. Matplotlib - matlab-style plotting & image display

  5. SkLearn - (Scikit-learn) Machine Learning library

  6. Today's lab: installing tools

Jupyter Notebooks

A single document containing a series of "cells". Each containing code which can be run, or images and other documentation.

  • Run a cell via [shift] + [Enter] or "play" button in the menu.

Will execute code and display result below, or render markup etc.

In [4]:
import datetime

print("This code is run right now (" + str( + ")")

This code is run right now (2018-08-30 19:22:01.852213)
In [5]:


First project: get Jupyter running and be able to import listed tools

Easiest to install via Anaconda. Preferrably Python 3.

In [ ]:
conda install numpy scipy scikit-learn jupyter matplotlib 

Many other packages. E.g. pandas.

Python Help Tips

  • Get help on a function or object via [shift] + [tab] after the opening parenthesis function(
  • Can also get help by executing function?


Numerical algorithm toolbox, similar to Matlab. Many key differences, such as zero-based indexing (like C) instead of one-based (like math texts).

Arrays - special data structure which allows direct and efficient linear algebra manipulations.

In [5]:
import numpy as np

x = np.array([2,0,1,8])

[2 0 1 8]

Fast Numerical Mathematics

In [1]:
l = range(1000)
%timeit [i ** 2 for i in l]
362 µs ± 34.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [2]:
import numpy as np
a = np.arange(1000)
%timeit a ** 2
2.73 µs ± 342 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

NumPy for Matlab Users

In [4]:
Will cover more next class


Implements higher-level scientific algorithms using NumPy. Examples:

  • Integration (scipy.integrate)
  • Optimization (scipy.optimize)
  • Interpolation (scipy.interpolate)
  • Signal Processing (scipy.signal)
  • Linear Algebra (scipy.linalg)
  • Statistics (scipy.stats)
  • File IO (
In [3]:
import scipy
In [4]:


Tutorial from:

Another important part of machine learning is the visualization of data. The most common tool for this in Python is matplotlib. It is an extremely flexible package, and we will go over some basics here.

Jupyter hass built-in "magic functions", the "matoplotlib inline" mode, which will draw the plots directly inside the notebook. Should be on by default.

In [16]:
%matplotlib inline

import matplotlib.pyplot as plt

# Plotting a line
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x));
In [17]:
# Scatter-plot points
x = np.random.normal(size=500)
y = np.random.normal(size=500)
plt.scatter(x, y);
In [18]:
# Showing images using imshow
# - note that origin is at the top-left by default!

x = np.linspace(1, 12, 100)
y = x[:, np.newaxis]

im = y * np.sin(x) * np.cos(y)

(100, 100)
In [19]:
# Contour plots 
# - note that origin here is at the bottom-left by default!
In [20]:
# 3D plotting
from mpl_toolkits.mplot3d import Axes3D
ax = plt.axes(projection='3d')
xgrid, ygrid = np.meshgrid(x, y.ravel())
ax.plot_surface(xgrid, ygrid, im,, cstride=2, rstride=2, linewidth=0);

There are many more plot types available. See matplotlib gallery.

Test these examples: copy the Source Code link, and put it in a notebook using the %load magic. For example:

In [21]:
# %load


Considered leading Machine Learning toolbox.

Many Machine Learning functions.

Couple dozen core developers + hundreds of other contributors.

2011 tutorial has over 10,000 citations.

In [36]:
import sklearn as sk


In this class:

We will emphasize the use of these tools to run algorithms rather than on ability to derive and code methods (which would require significantly more prerequisite abilities).

We will focus on the key issues in using machine learning methods in practical applications, such as loading data, preprocessing, validation, and determining which method to use.

Lab: get this stuff running!