BDS 761: Data Science and Machine Learning I


drawing

Topic 1: Introduction

This topic:¶

  1. Class topics
  2. Syllabus
  3. Software installation
  4. Q&A Discussion

Reading:

  • https://jupyter.org/try-jupyter/notebooks/?path=notebooks/Intro.ipynb

I. Class topics¶

Catalog description¶

  • Data wrangling
  • Dynamic data visualization
  • Reproducible research
  • Applied machine learning

Course content delivered through lectures and hands-on lab instruction.

General Topic List¶

We will focus on methods and tools in the following broad areas

  1. Text processing
  2. Matrix algebra methods and software
  3. Introduction to Machine Learning and Deep Learning
  4. Natural Language Processing

Objectives of this class¶

  • Be able to use "core" python libraries in your research
  • Understand how SOTA A.I. methods are broadly based on these same libraries
  • Be able to implement basic processing and a few machine learning algorithms from "scratch"
  • Generally understand what is going on in research publications

II. Syllabus Discussion¶

  • Homework and readings will be provided at end of class or via announcement later that evening. Due at beginning of following class. Points deducted if show up late.
  • No particular textbook needed
  • A computer is needed to participate in class.
  • Attendance not mandatory (?). Will attempt to record classes. Please do not come to class with anything contagious.
  • Academic integrity - can discuss verbally. Do not share work or copy fellow students' writing or code. Be very careful about basing your work on code from internet.
  • Office hours TBD.

Course Information¶

  • Labs/Participation/Homework - 20%
  • Midterm - 30%
  • Final Exam - 30%
  • Project - 20%

Point of lab/participation/homework is to encourage you to study and learn. Easy points.

Point of exams is to decide your grade.

Will discuss project later. Basically it will be a more complete version of a lab project, including validation and writeup. And a poster session defending your analysis.

Prerequisites: programming + math¶

  • Programming skills necessary. We will be using Python. If the amount of work seems to be overwhelming, it is most likely due to a deficiency here.

  • Vector geometry

  • Matrix Algebra

  • Prob & Stat won't be used much

Prereqs exist for very good reasons.

Books¶

There is no required text. There is a vast supply of free resources online. Suggestions:

Introduction to Applied Linear Algebra, Boyd & Vendenberghe 2018, http://vmls-book.stanford.edu/

Hands-On Machine Learning with Scikit-Learn and TensorFlow, Concepts, Tools, and Techniques to Build Intelligent Systems, 2e, Géron 2019

Deep Learning with Python, 2e, Chollet 2021

Speech and Language Processing, 3e, Jurafsky & Martin 2024. https://web.stanford.edu/~jurafsky/slp3/

Academic Integrity, etc.¶

  • See student handbook. This is your contract.
  • Fairness will not be sacrificied for other noble causes
  • Big source of drama: students skipping class or not doing homework then being unhappy with exams they could not handle as a result

III. Software Installation¶

Jupyter - "notebooks" for inline code + LaTex math + markup, etc.¶

A single document containing a series of "cells". Each containing code which can be run, or images and other documentation.

  • Run a cell via [shift] + [Enter] or "play" button in the menu.
drawing

Will execute code and display result below, or render markup etc.

Can also use R or Julia (easily), Matlab, SQL, etc. (with increasing difficulty).

In [2]:
import datetime

print("This code is run right now (" + str(datetime.datetime.now()) + ")")

'hi'
This code is run right now (2020-01-22 18:31:02.681214)
Out[2]:
'hi'
In [3]:
x=1+2+2

print(x)
5
In [4]:
import numpy as np
In [9]:
np.random.randn(2,5)
Out[9]:
array([[ 1.24350758,  1.99906955, -0.3226366 , -0.98266019, -0.1309466 ],
       [-0.85026968, -0.35865037,  0.70637075,  1.06492839,  0.35220974]])
In [12]:
np.ones((2,2))
Out[12]:
array([[1., 1.],
       [1., 1.]])

Installation¶

First project: get Jupyter running and be able to import listed tools

Easiest to install via Anaconda. Preferrably Python 3.

https://www.anaconda.com/download/

Highly recomended to make a separate environment for class - hot open source tools change fast and deprecate (i.e. break) old features constantly

conda install jupyter matplotlib numpy scipy scikit-learn pandas ...

Many other packages...

Python Help Tips¶

  • Get help on a function or object via [shift] + [tab] after the opening parenthesis function(
drawing
  • Can also get help by executing function?
drawing

IV. Q & A Discussion¶

  • Virtual vs In-person classes?
  • Job interests/plans?
  • Research topics?