module - a single .py file with functions and other things defined.
package - a directory full of modules
import math
from math import cos
from math import *
note that a module is just a python script. may define things or may be code that runs when imported
https://docs.python.org/3/library/math.html
"This module provides access to common mathematical functions and constants, including those defined by the C standard."
import math
#dir(math) # list attributes and methods in math module
#help(math) # print the comment at top of module's .py file
import numpy # a package
import numpy.linalg # a module?
numpy.__name__, numpy.linalg.__name__
('numpy', 'numpy.linalg')
#dir(numpy.linalg)
functions and varibles you define can be saved in a separate file and imported.
Needs to be in current working directory or path
import sys
sys.path.append("c:\\path\\to\\my\\file")
import myfile
If you change the file it won't be reflected in the operation due to behind-the-scenes optimizations.
Simply importing again doesn't work either. Use importlib (as of current python).
import reloadlib
importlib.reload(myfile)
Can define functions inside of functions (can import too).
Still not run until called.
def f():
print('f')
def f1():
print('f1')
def f1_1():
print('f1_1')
f1_1()
def f2():
print('f2')
f2()
f1()
print('outside')
f()
outside f f2 f1 f1_1
Each function has its own 'local' scope, that differs from the parent scope
A Namespace means a named scope you can potentiall access from outside by name.
math.cos # access cos() function from math namespace
Functions can potentially access variables and functions in the parent scopes.
Can explicitly refer to global scope using 'global'
Can explicitly refer to parent scope using 'nonlocal' (unless parent is global)
Scope refers to the definitions that are local at a given time, meaning in current use
Everything else is saved temporarily, to support hierarchical operations, multiple threads, etc.
x = 5 // parent of local scope
{ // new local scope created by something with brackets (like a function)
x = 1
print(x) // prints 1
} // scope ended. all local variables deleted
print(x) // prints 5
# python has indents instead of brackets. Indents must be part of function or other statement.
x=5
def f0(x):
x = 2
def f1(x):
x = 3
return x
def f2():
return x
print('f1(x)',f1(x))
print('f2(x)',f2())
print('at end of f0', x)
return x
print('f(x)',f0(x))
print('outside function', x)
f1(x) 3 f2(x) 2 at end of f0 2 f(x) 2 outside function 5
what happens if we comment out the 'x = 2' line?
Global means variables or functions accessible by everything.
Local means accessible from inside the scope, or possibly by giving path.
x=5 # "global" since inside nothing. main notebook memory
def say():
x = 2 # nonlocal according to child nested functions
print('start parent function:',x)
def say_nonlocal():
nonlocal x
print('nonlocal from view of nested function:',x)
def say_global():
global x
print('global:',x)
say_nonlocal()
say_global()
print('end parent function:',x)
say()
start parent function: 2 nonlocal from view of nested function: 2 global: 5 end parent function: 2
an ipython (core of Jupyter) "magic" command for viewing current namespace
Can often just use 'whos' if by itself
%reset - clear namespace
%store - save namespace
%reset
%whos
Once deleted, variables cannot be recovered. Proceed (y/[n])? y Interactive namespace is empty.
%whos
Interactive namespace is empty.
x = 5
%whos
Variable Type Data/Info ---------------------------- x int 5
#locals()
#globals()
#dir()
#dir(math)
Variable that identifies the current namespace.
Note, not quite the same as current scope, e.g. in a nested function.
__name__
'__main__'
def myfunction():
print('__name__ seen inside myfunction:',__name__)
myfunction()
print('f.__name__ =', myfunction.__name__)
__name__ seen inside myfunction: __main__ f.__name__ = myfunction
import math
math.__name__
'math'
# __main__ commonly used for scripts to behave differently if imported versus testing
def f():
print("run")
if __name__ == "__main__": # only true if file (or notebook cell) is run, not imported
f()
run
assign a value to a parameter at function definition time. If the caller omits that argument, the default is used.
def power(x, exponent=2):
return x ** exponent
print(power(3)) # 9
print(power(3, 3)) # 27
9 27
Allow arguments to be passed by name rather than position.
This improves clarity and alows varying order
print(power(x=3, exponent=3))
print(power(exponent=3, x=3))
27 27
* packing/unpacking for tuple
** are packing / unpacking for dict
Direction depends whether the variable is being created or already there
*args collects extra positional arguments into a tuple.
def f(a, *args): # accepts variable number of arguments
print(a)
print(args)
f(1, 2, 3, 4)
1 (2, 3, 4)
def f(*args):
print(args)
f(1, 2, 3, 4)
(1, 2, 3, 4)
Used to hand arguments from our function into another where number may be unknown
def print2(*args): # inputs packed into args with *
print("running print 2")
print(*args) # args unpacked to inputs with *
print2('this','acts','like','a','print')
running print 2 this acts like a print
Collects keyword arguments into a dictionary
def g(a, **kwargs):
print(a)
print(kwargs)
g(1, x=10, y=20)
1
{'x': 10, 'y': 20}
Use both * and ** to caputre inputs both ways
def g(*args, **kwargs):
print(args)
print(kwargs)
g(1, 2, x=10, y=20) # order needs to be same as definition, here position then keyword args
(1, 2)
{'x': 10, 'y': 20}
Python treats functions as 'first class objects' (same level as numbers, strings,...) in terms of how it can be used in programming.
Functions can be:
def f(x):
return x*x
f(2)
4
print('initial', f(2))
def f(x):
return x
print('final',f(2))
initial 4 final 2
Note what happens if the above cell (only) is run a second time.
max(1,2)
2
id(max)
1704608609440
original_max = max
id(original_max)
1704608609440
def max(a,b):
return a
max(1,2)
1
original_max(1,2)
2
print(id(max), id(original_max))
1704692034720 1704608609440
# note the same thing can happen with variables
x = 5
print(id(x))
x = x+1
print(id(x))
1704608295344 1704608295376
# "factory pattern": returns a function based on an input selection
def add1(x,y):
return x+y
def add2(x,y):
return -1*(x+y)
addtype = '2'
def addfactory(addtype):
if addtype=='1':
adder = add1
elif addtype=='2':
adder = add2
else:
adder = ''
print('unrecognized selection')
return adder
myadder = addfactory(addtype)
myadder(5,5)
-10
adder
<function __main__.add2(x, y)>
print(id(add1), id(add2), id(myadder))
1704692903216 1704692903072 1704692903072
def myfunc(a):
return int(a)
# this tester checks for ability to handle bad inputs
def mytester(f, x):
try:
print('testing...', x)
f(x)
print('test passed')
except:
print('test failed')
mytester(myfunc, 1)
mytester(myfunc, '1')
mytester(myfunc, 'a')
testing... 1 test passed testing... 1 test passed testing... a test failed
A decorator is a function that takes a function and returns a function. It creates an invisible wrapper that runs in addition to the function.
def my_decorator(f):
def wrapper(x):
print("before")
result = f(x)
print("after")
return result
return wrapper
@my_decorator
def f(x):
return x * 2
f(x)
before after
12
import time
def my_timer_decorator(func):
def wrapper(*args, **kwargs): # way to collect variable number of argument into a variable, kwargs means keyword args
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print("function took", elapsed, "seconds")
return result
return wrapper
@my_timer_decorator
def f100(x):
return x**100
f100(2.3)
function took 6.299989763647318e-06 seconds
1.4886191506362924e+36
Add information ("metadata") to the function internally. Not used by python itself (for debuggers etc.)
def area(r):
return 3.14159 * r * r
area
<function __main__.area(r)>
def area(r: float) -> float:
return 3.14159 * r * r
area
<function __main__.area(r: float) -> float>
area.__annotations__
{'r': float, 'return': float}
An "inline" way to define a function that computes a single expression. Sometimes convenient.
f = lambda x: x * x
print(f)
print(f(3)) # 9
<function <lambda> at 0x0000018CE782FE50> 9
def square(x):
return x*x
print(square)
print(square(3)) # 9
<function square at 0x0000018CE7902280> 9
Functions that recursively call themselves
def recurf(i):
i = i+1
print(i)
if i>10:
print('done\n')
return
recurf(i)
recurf(1)
recurf(5)
2 3 4 5 6 7 8 9 10 11 done 6 7 8 9 10 11 done
Recursive calculation inefficient due to repeatedly calculating same terms
def fib_recursive(n):
if n == 0: return 0
if n == 1: return 1
return fib_recursive(n-1) + fib_recursive(n-2)
Each term requires two additional terms be calculated. Exponential time.
Intelligently plan terms to calculate and store ("cache"), e.g. in a table.
Each term requires one term be calculated.
Always plan out the data structure and calculation order.
The data structure to fill in for Fibonacci is trivial:
Optimal order is to start at bottom and work up to $n$, so always have what you need for next term.
def fib_recursive(n):
if n == 0: return 0
if n == 1: return 1
return fib_recursive(n-1) + fib_recursive(n-2)
def fib_dp(n):
fib_seq = [0, 1]
for i in range(2,n+1):
fib_seq.append(fib_seq[i-1] + fib_seq[i-2])
return fib_seq[n]
Let's compare the runtime
%timeit -n4 fib_recursive(30)
390 ms ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 4 loops each)
%timeit -n4 fib_dp(100)
14.5 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 4 loops each)
Complexity in algorithm theory refers to the runtime of an algorithm, usually described by a function which bounds the worst-case number of computations based on the input size.
Basically in data science we want linear complexity, meaning the computation time is a scalar multiple of the data size.
Exponential complexity is practically impossible. Such algorithms are replaced by rough approximations out of necessity.
Temporarily stores the calculation results of processed input such as the results of function calls.
If the same input or a function call with the same parameters is used, the previously stored results can be used again and unnecessary calculation are avoided.
In many cases a simple array is used for storing the results, but lots of other structures can be used as well, such as associative arrays, called hashes in Perl or dictionaries in Python.
def fib_recursive(n):
if n == 0: return 0
if n == 1: return 1
return fib_recursive(n-1) + fib_recursive(n-2)
def fib_dp(n):
cache = [0,1]
for i in range(2,n+1):
cache.append(cache[i-1]+cache[i-2])
return cache[n]
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
fib_recursive_memoized = memoize(fib_recursive)
fib_recursive_memoized(9)
34
%timeit fib_recursive(9)
6.23 μs ± 177 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%timeit fib_recursive_memoized(9)
80.7 ns ± 1.49 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
from functools import lru_cache
@lru_cache()
def fib_recursive(n):
"Calculate nth Fibonacci number using recursion"
if n == 0: return 0
if n == 1: return 1
return fib_recursive(n-1) + fib_recursive(n-2)
%timeit fib_recursive(n)
55 ns ± 1.69 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
import joblib
#dir(joblib)
How can we use redundancy here?
In a recursive approach, your function recursively calls itself on smaller subproblems as needed, until the calculation is done. Potentially performing many redundant calculations naively.
With memoization, you form a cache of these recursive calls, and check if the same one is called again before recalculating. Still potential danger since do not plan memory needs or usage.
With Dynamic Programming, you implement a bottum-up plan to produce the cache of results for smaller subproblems.