BDS 754: Principles of Programming with Python


drawing

Topic 1: Introduction

This topic:¶

  1. Motivation
  2. Class topics
  3. Syllabus
  4. Using A.I.
  5. Git
  6. Jupyter

Reading:

  • https://jupyter.org/try-jupyter/notebooks/?path=notebooks/Intro.ipynb

0. Motivation¶

Python is in Demand¶

drawing

Huge Ecosystem¶

Popularity means more people creating products in Python, including free software.

drawing

Most of the software underlying modern A.I. is free Python libraries (Pytorch, Numpy, BLAS variants)

https://duchesnay.github.io/pystatsml/introduction/introduction_python_for_datascience.html

https://trends.google.com/trends/explore?cat=31&date=all&q=python,R,matlab,spss,stata

I. Class topics¶

Course Theme¶

This course will cover Python for people without programming skills.

Specific focus will be on methods, libraries, and tools used in Data Science, including linear algebra, text processing, and modern A.I.

General Topic List¶

We will focus on the following broad areas

  1. Programming basics in Python
  2. Text and string processing
  3. Mathematical programming
  4. Key Python libraries: numpy, scipy, matplotlib, ...
  5. Programming tools: jupyter, git, VSCode

Objectives of this class¶

  • Be able to write basic data manipulation and algorithms from scratch, with well-formatted code.
  • Be able to use core python libraries in your research
  • Become familiar with the basic programming tools used in Data Science and related fields.

II. Syllabus Discussion¶

  • Homework and readings will be provided at end of class or via announcement later that evening. Due at beginning of following class. Points may be deducted if turned in late.
  • A computer is required to participate in class. Participation is required to pass the class.
  • Attendance not mandatory (can be online). Will attempt to record classes. Please do not come to class with anything contagious.
  • Missed exam questions has a nearly perfect correlation with missed classes and skipped homeworks.
  • Academic integrity - can discuss, use A.I., or any other resources. But turn in your own version, not a copy of others'. Expect to fail the quizzes and exams if you rely on A.I.
  • Office hours TBD.

Course Information¶

  • Labs/Participation/Homework - 15%
  • Quizzes - 15%
  • Midterm - 30%
  • Final Exam - 40%

Point of lab/participation/homework is to encourage you to study and learn. Easy points.

Quizzes will be closed-book and test your understanding of the basic concepts from the homework and previous week.

Point of exams is to decide your grade. They will be based on the quizzes and homeworks.

Prerequisites: DS program admission basic requirements¶

  • Vector geometry

  • Matrix Algebra

  • Prob & Stat won't be used much

Books¶

We will follow much of Python for Everybody by Charles Severance for programming basics.

The text is available online: https://www.py4e.com/book.php

The lessons are available at: https://www.py4e.com/lessons

Lecture videos, slieds, code, etc.: https://www.py4e.com/materials

There is a vast supply of other free Python resources online.

some python and linear algebra review material: https://www.keithdillon.com/index.php/bootcamp/

III. Using A.I.¶

Vibe coding¶

programming style based on intuition, momentum, and creative flow, letting A.I. handle the details, rather than doing careful planning, formal specifications, or following best practices.

May be ok for...

  • Early prototyping and demos
  • Creative coding and generative art
  • Solo projects with low coordination cost

Drawbacks: accumulates technical debt and does not scale well to large systems or teams.

Technical debt¶

past decisions in creating software which inhibit ability to build on it. Evantually must be "paid" by rewriting/refactoring.

Example: a function to compute the mean of a list of numbers¶

In [ ]:
# simple solution. easy to understand

def mean(xs: list[float]) -> float:
    return sum(xs) / len(xs)
In [ ]:
# potential A.I. solution...

import numpy as np
import pandas as pd
from statistics import mean as stat_mean
from typing import List, Optional
import logging

logging.basicConfig(level=logging.INFO)

class StatsEngine:
    def __init__(self, data: List[float]):
        self.df = pd.DataFrame({"values": data})

    def compute_mean(self) -> Optional[float]:
        try:
            np_array = self.df["values"].to_numpy(dtype=float)
            result = stat_mean(np_array)
            logging.info("Mean computed successfully")
            return float(result)
        except Exception as e:
            logging.error("Failed to compute mean: %s", e)
            return None

def mean(xs: List[float]) -> float:
    engine = StatsEngine(xs)
    result = engine.compute_mean()
    if result is None:
        raise ValueError("mean computation failed")
    return result

Why did A.I. do this?¶

It was trained on code written with different priorities from yours.

  1. Those complexities may be needed in production, such as checking for bad inputs on a web tool.
  2. People who compute stats for data probably are using those toolboxes already.
  3. People who wrote the code used in training data often don't even know how to do the fundamental math, hence use toolboxes

Drawbacks¶

The code has all these dependencies, meaning code from elsewhere it leverags. This adds greatly to the size of the software product, hurts its performance, and possibly affects licensing.

import numpy as np
import pandas as pd
from statistics import mean as stat_mean
from typing import List, Optional
import logging

If any of them changes or becomes unavailable the code must be updated. This process continues constantly in industry.

This complexity also inhibits ability to understand the code (and learn in our class!) if something goes wrong or needs to be changed. For example, for some project you want to change the statistic computation slightly. Maybe the library can't even do what you want.

Using A.I. Effectively¶

A.I. is good at some things and in some applications, and bad at others. You need to find out which in your case

Examples I've seen recently:

  • getting confused between different forms of statistical score and Fisher information, then running with that to provide mathematical nonsense.
  • getting confused by more subtle nuances of Python even (references versus values)
  • Treating Quantum Mechanics as if it was Classical Mechanics.

Improving (optimizing) the prompt you give, including instructions in the settings, makes a huge difference. And the question of what is best is not intuitive.

Middle-to-Middle solution¶

"end-to-end" operation - trying to get A.I. to give final answer when provided the original problem.

e2e might work or it might give a nonworking mess that's impossible to understand.

Instead, break problem into smaller tasks, where you can understand and check each part. How you would help an assistant get a large task done.

IV. Git / GitHub¶

Source Control¶

Non-source-control approach - make a copy of all your code periodically, or each time there's an important point reached (like you added something abd got it working). Then if you make changes and break something, you can go back or compare exactly what got changed. You might do the same for papers you write.

Source control - a.k.a. Version control. software to track such changes for you, remembering all previous versions, allowing changes to be undone and compared.

You indicate the "check ins" when the source control software is to find and remember everything you changed.

The collection of check-ins is called the repository, or repo.

Uses efficient string difference algorithms to save only the minimal information for each saved change. E.g., minimum edit distance change.

Git¶

"Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency."

You can create and access repos, view changes in them, and do more complex things like merging different versions of files.

https://git-scm.com/install/windows

https://gitforwindows.org/

https://github.com/apps/desktop

Commandline operation¶

git init                 # start tracking a project
git clone <url>          # copy an existing repo
git status               # see what changed
git add <file>           # stage changes
git commit -m "message"  # create a snapshot
git log                  # view history
git branch               # list branches -- more advanced, don't worry about it now
git checkout -b feature  # create and switch branch
git merge feature        # combine branches -- more advanced, don't worry about it now
git pull                 # fetch + merge from remote
git push                 # send commits to remote

GitHub¶

A Cloud-based code sharing and backup platform that can connect to a git repo

Huge amount of shared projects. Most open source is here.

https://github.com/

https://github.com/trending

Create account to join classroom repo: https://github.com/signup

Git walkthrough¶

Setup (one time)¶

  • Install Git for Windows.
  • Open Git Bash.
  • Instructor provides a repository URL with a single text file, e.g. notes.txt.

Step 1 Clone (get the project)¶

git clone <REPO_URL>
cd <repo-folder>

This creates a local copy with full history.

Step 2 Pull (sync with remote)¶

Not needed yet since we just got it, but try out the command.

git pull

Step 3 Make a change¶

  • Edit one or more of the files using your favority tool.
  • Save the change(s)

Step 4 Check what changed¶

git status

Shows modified files.

git diff

Shows exactly what lines changed. Students must read this output.

Alternative: use a graphical merge tool suich as Meld ow Winmerge

Step 5 Stage the change¶

git add notes.txt

Verify:

git status

Step 6 Commit (record the change)¶

git commit -m "Add note with name and date"

This creates a permanent snapshot.

Check history:

git log --oneline

Step 7 Push (send to remote)¶

git push

Now the change exists on the server.

V. Jupyter¶

Jupyter - "notebooks" for inline code + LaTex math + markup, etc.¶

A single document containing a series of "cells". Each containing code which can be run, or images and other documentation.

  • Run a cell via [shift] + [Enter] or "play" button in the menu.
drawing

Will execute code and display result below, or render markup etc.

Can also use R or Julia (easily), Matlab, SQL, etc. (with increasing difficulty).

Anaconda¶

Easiest to install Jupyter via Anaconda on Windows/Mac. Or directly with Linux.

https://www.anaconda.com/download/

Highly recomended to make a separate environment for class - hot open source tools change fast and deprecate (i.e. break) old features constantly

conda install jupyter matplotlib numpy scipy  ...

Many other packages...

Python Help Tips¶

  • Get help on a function or object via [shift] + [tab] after the opening parenthesis function(
drawing
  • Can also get help by executing function?
drawing
In [1]:
import datetime

print("This code is run right now (" + str(datetime.datetime.now()) + ")")

'hi'
This code is run right now (2025-08-26 14:41:26.932482)
Out[1]:
'hi'
In [3]:
x=1+2+2

print(x)
5
In [4]:
import numpy as np
In [9]:
np.random.randn(2,5)
Out[9]:
array([[ 1.24350758,  1.99906955, -0.3226366 , -0.98266019, -0.1309466 ],
       [-0.85026968, -0.35865037,  0.70637075,  1.06492839,  0.35220974]])
In [12]:
np.ones((2,2))
Out[12]:
array([[1., 1.],
       [1., 1.]])