Functions: Advanced Topics¶

  • modules/packages
  • scope
  • arguments
  • references to functions
  • lambda functions
  • annotations
  • decorators
  • recursion

Importing modules and packages¶

module - a single .py file with functions and other things defined.

package - a directory full of modules

import math
from math import cos
from math import *

note that a module is just a python script. may define things or may be code that runs when imported

math module.¶

https://docs.python.org/3/library/math.html

"This module provides access to common mathematical functions and constants, including those defined by the C standard."

In [87]:
import math

#dir(math) # list attributes and methods in math module
In [88]:
#help(math) # print the comment at top of module's .py file
In [54]:
import numpy # a package

import numpy.linalg # a module?

numpy.__name__, numpy.linalg.__name__
Out[54]:
('numpy', 'numpy.linalg')
In [89]:
#dir(numpy.linalg)

creating libraries¶

functions and varibles you define can be saved in a separate file and imported.

Needs to be in current working directory or path

import sys
sys.path.append("c:\\path\\to\\my\\file")
import myfile

If you change the file it won't be reflected in the operation due to behind-the-scenes optimizations.

Simply importing again doesn't work either. Use importlib (as of current python).

import reloadlib
importlib.reload(myfile)

Nested functions¶

Can define functions inside of functions (can import too).

Still not run until called.

In [66]:
def f():
    print('f')
    def f1():
        print('f1')
        def f1_1():
            print('f1_1')
        f1_1()

    def f2():
        print('f2')

    f2()
    f1()

print('outside')
f()
outside
f
f2
f1
f1_1

Scope and Namespaces¶

Each function has its own 'local' scope, that differs from the parent scope

A Namespace means a named scope you can potentiall access from outside by name.

math.cos # access cos() function from math namespace

Functions can potentially access variables and functions in the parent scopes.

Can explicitly refer to global scope using 'global'

Can explicitly refer to parent scope using 'nonlocal' (unless parent is global)

Scope in other languages¶

Scope refers to the definitions that are local at a given time, meaning in current use

Everything else is saved temporarily, to support hierarchical operations, multiple threads, etc.

x = 5 // parent of local scope

{ // new local scope created by something with brackets (like a function)
    x = 1

    print(x) // prints 1

} // scope ended. all local variables deleted

print(x) // prints 5
In [56]:
# python has indents instead of brackets. Indents must be part of function or other statement.

x=5 

def f0(x):
    x = 2
    
    def f1(x):
        x = 3
        return x
        
    def f2():
        return x
    
    print('f1(x)',f1(x))
    print('f2(x)',f2())
    
    print('at end of f0', x)
    
    return x

print('f(x)',f0(x))

print('outside function', x)
f1(x) 3
f2(x) 2
at end of f0 2
f(x) 2
outside function 5

what happens if we comment out the 'x = 2' line?

Globals and locals¶

Global means variables or functions accessible by everything.

Local means accessible from inside the scope, or possibly by giving path.

In [64]:
x=5 # "global" since inside nothing. main notebook memory

def say():
    x = 2 # nonlocal according to child nested functions
    
    print('start parent function:',x)

    def say_nonlocal():
        nonlocal x
        print('nonlocal from view of nested function:',x)
        
    def say_global():
        global x 
        print('global:',x)
        
    say_nonlocal()
    say_global()
    
    print('end parent function:',x)

say()
start parent function: 2
nonlocal from view of nested function: 2
global: 5
end parent function: 2

%whos¶

an ipython (core of Jupyter) "magic" command for viewing current namespace

Can often just use 'whos' if by itself

%reset - clear namespace
%store - save namespace
In [100]:
%reset

%whos
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
Interactive namespace is empty.
In [101]:
%whos
Interactive namespace is empty.
In [102]:
x = 5

%whos
Variable   Type    Data/Info
----------------------------
x          int     5
In [85]:
#locals()
In [84]:
#globals()
In [82]:
#dir()
In [83]:
#dir(math)

__name__¶

Variable that identifies the current namespace.

Note, not quite the same as current scope, e.g. in a nested function.

In [26]:
__name__
Out[26]:
'__main__'
In [39]:
def myfunction():
    print('__name__ seen inside myfunction:',__name__)
    
myfunction()
print('f.__name__ =', myfunction.__name__)
__name__ seen inside myfunction: __main__
f.__name__ = myfunction
In [30]:
import math

math.__name__
Out[30]:
'math'
In [67]:
# __main__ commonly used for scripts to behave differently if imported versus testing

def f():
    print("run")

if __name__ == "__main__": # only true if file (or notebook cell) is run, not imported
    f()
run

Default arguments¶

assign a value to a parameter at function definition time. If the caller omits that argument, the default is used.

In [113]:
def power(x, exponent=2):
    return x ** exponent

print(power(3))      # 9
print(power(3, 3))   # 27
9
27

Keyword arguments¶

Allow arguments to be passed by name rather than position.

This improves clarity and alows varying order

In [115]:
print(power(x=3, exponent=3))
print(power(exponent=3, x=3))
27
27

Packing and unpacking operators¶

* packing/unpacking for tuple

** are packing / unpacking for dict

Direction depends whether the variable is being created or already there

Packing to tuple with *¶

*args collects extra positional arguments into a tuple.

In [118]:
def f(a, *args): # accepts variable number of arguments
    print(a)
    print(args)

f(1, 2, 3, 4)
1
(2, 3, 4)
In [119]:
def f(*args):
    print(args)

f(1, 2, 3, 4)
(1, 2, 3, 4)

packing inputs then unpacking into new function¶

Used to hand arguments from our function into another where number may be unknown

In [122]:
def print2(*args): # inputs packed into args with *
    print("running print 2")
    print(*args) # args unpacked to inputs with *
    
print2('this','acts','like','a','print')
running print 2
this acts like a print

Packing to dict with **¶

Collects keyword arguments into a dictionary

In [124]:
def g(a, **kwargs):
    print(a)
    print(kwargs)

g(1, x=10, y=20)
1
{'x': 10, 'y': 20}

Accepting variable numbers of inputs¶

Use both * and ** to caputre inputs both ways

In [129]:
def g(*args, **kwargs):
    print(args)
    print(kwargs)
    
g(1, 2, x=10, y=20) # order needs to be same as definition, here position then keyword args
(1, 2)
{'x': 10, 'y': 20}

Functions as variables¶

Python treats functions as 'first class objects' (same level as numbers, strings,...) in terms of how it can be used in programming.

Functions can be:

  • renamed to any valid variable name
  • passed in as calling arguments or returned from functions
  • be stored in more complex data structures (e.g., lists).

function names can be overwritten¶

In [79]:
def f(x):
    return x*x

f(2)
Out[79]:
4
In [80]:
print('initial', f(2))

def f(x):
    return x

print('final',f(2))
initial 4
final 2

Note what happens if the above cell (only) is run a second time.

the id function¶

In [2]:
max(1,2)
Out[2]:
2
In [1]:
id(max)
Out[1]:
1704608609440
In [3]:
original_max = max

id(original_max)
Out[3]:
1704608609440
In [4]:
def max(a,b):
    return a

max(1,2)
Out[4]:
1
In [6]:
original_max(1,2)
Out[6]:
2
In [7]:
print(id(max), id(original_max))
1704692034720 1704608609440
In [18]:
# note the same thing can happen with variables

x = 5

print(id(x))

x = x+1 

print(id(x))
1704608295344
1704608295376
In [53]:
# "factory pattern": returns a function based on an input selection

def add1(x,y):
    return x+y

def add2(x,y):
    return -1*(x+y)

addtype = '2'

def addfactory(addtype):
    if addtype=='1':
        adder = add1
    elif addtype=='2':
        adder = add2
    else:
        adder = ''
        print('unrecognized selection')
    return adder
    
myadder = addfactory(addtype)              
myadder(5,5)
Out[53]:
-10
In [54]:
adder
Out[54]:
<function __main__.add2(x, y)>
In [56]:
print(id(add1), id(add2), id(myadder))
1704692903216 1704692903072 1704692903072

Functions of functions¶

In [15]:
def myfunc(a):
    return int(a)

# this tester checks for ability to handle bad inputs
def mytester(f, x):
    try:
        print('testing...', x)
        f(x)
        print('test passed')
    except:
        print('test failed')
    

mytester(myfunc, 1)
mytester(myfunc, '1')
mytester(myfunc, 'a')
testing... 1
test passed
testing... 1
test passed
testing... a
test failed

Decorators¶

A decorator is a function that takes a function and returns a function. It creates an invisible wrapper that runs in addition to the function.

In [57]:
def my_decorator(f):
    def wrapper(x):
        print("before")
        result = f(x)
        print("after")
        return result
    return wrapper
In [58]:
@my_decorator
def f(x):
    return x * 2
In [59]:
f(x)
before
after
Out[59]:
12
In [106]:
import time

def my_timer_decorator(func):
    def wrapper(*args, **kwargs): # way to collect variable number of argument into a variable, kwargs means keyword args
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print("function took", elapsed, "seconds")
        return result
    return wrapper
In [107]:
@my_timer_decorator
def f100(x):
    return x**100

f100(2.3)
function took 6.299989763647318e-06 seconds
Out[107]:
1.4886191506362924e+36

Annotations¶

Add information ("metadata") to the function internally. Not used by python itself (for debuggers etc.)

In [20]:
def area(r):
    return 3.14159 * r * r

area
Out[20]:
<function __main__.area(r)>
In [21]:
def area(r: float) -> float:
    return 3.14159 * r * r

area
Out[21]:
<function __main__.area(r: float) -> float>
In [33]:
area.__annotations__
Out[33]:
{'r': float, 'return': float}

Lambda functions¶

An "inline" way to define a function that computes a single expression. Sometimes convenient.

In [25]:
f = lambda x: x * x

print(f)
print(f(3))   # 9
<function <lambda> at 0x0000018CE782FE50>
9
In [27]:
def square(x):
    return x*x

print(square)
print(square(3))   # 9
<function square at 0x0000018CE7902280>
9

Recursion¶

Functions that recursively call themselves

In [49]:
def recurf(i):
    i = i+1 
    print(i)
    if i>10: 
        print('done\n')
        return
    recurf(i)
    
recurf(1)
recurf(5)
2
3
4
5
6
7
8
9
10
11
done

6
7
8
9
10
11
done

Ex: Fibonacci sequence¶

\begin{align} f(n) &= f(n-1) + f(n-2) \\ f(0) &= 0 \\ f(1) &= 1 \\ \end{align}

Recursive calculation inefficient due to repeatedly calculating same terms

In [65]:
def fib_recursive(n):
    if n == 0: return 0
    if n == 1: return 1
    return fib_recursive(n-1) + fib_recursive(n-2)
drawing

Each term requires two additional terms be calculated. Exponential time.

Dynamic Programming¶

Intelligently plan terms to calculate and store ("cache"), e.g. in a table.

drawing

Each term requires one term be calculated.

Dynamic Programming Caching¶

Always plan out the data structure and calculation order.

  • need to make sure you have sufficient space
  • need to choose optimal strategy to fill in

The data structure to fill in for Fibonacci is trivial:

drawing

Optimal order is to start at bottom and work up to $n$, so always have what you need for next term.

In [108]:
def fib_recursive(n):
    if n == 0: return 0
    if n == 1: return 1
    return fib_recursive(n-1) + fib_recursive(n-2)
In [109]:
def fib_dp(n):
    fib_seq = [0, 1]
    for i in range(2,n+1):
        fib_seq.append(fib_seq[i-1] + fib_seq[i-2])
    return fib_seq[n]

Let's compare the runtime

In [34]:
%timeit -n4 fib_recursive(30)
390 ms ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 4 loops each)
In [35]:
%timeit -n4 fib_dp(100)  
14.5 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 4 loops each)

Algorithm Complexity¶

Complexity in algorithm theory refers to the runtime of an algorithm, usually described by a function which bounds the worst-case number of computations based on the input size.

Basically in data science we want linear complexity, meaning the computation time is a scalar multiple of the data size.

Exponential complexity is practically impossible. Such algorithms are replaced by rough approximations out of necessity.

Memoization¶

Temporarily stores the calculation results of processed input such as the results of function calls.

If the same input or a function call with the same parameters is used, the previously stored results can be used again and unnecessary calculation are avoided.

In many cases a simple array is used for storing the results, but lots of other structures can be used as well, such as associative arrays, called hashes in Perl or dictionaries in Python.

memoization example

In [138]:
def fib_recursive(n):
    if n == 0: return 0
    if n == 1: return 1
    return fib_recursive(n-1) + fib_recursive(n-2)
In [139]:
def fib_dp(n):
    cache = [0,1]
    
    for i in range(2,n+1):
        cache.append(cache[i-1]+cache[i-2])
    return cache[n]
In [146]:
def memoize(f):
    memo = {}
    def helper(x):
        if x not in memo:            
            memo[x] = f(x)
        return memo[x]
    return helper

fib_recursive_memoized = memoize(fib_recursive)
In [147]:
fib_recursive_memoized(9)
Out[147]:
34
In [148]:
%timeit fib_recursive(9)
6.23 μs ± 177 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [149]:
%timeit fib_recursive_memoized(9)
80.7 ns ± 1.49 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Memoization in Python with LRU cache¶

LRU = Least Recently Used

The docs on lru_cache

In [150]:
from functools import lru_cache

@lru_cache()
def fib_recursive(n):
    "Calculate nth Fibonacci number using recursion"
    if n == 0: return 0
    if n == 1: return 1
    return fib_recursive(n-1) + fib_recursive(n-2)
In [152]:
%timeit fib_recursive(n)
55 ns ± 1.69 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Memoization in Python with joblib¶

https://joblib.readthedocs.io/en/latest/

In [91]:
import joblib

#dir(joblib)

Exercise: Compute a polynomial over a list of points¶

How can we use redundancy here?

Recursion summary¶

In a recursive approach, your function recursively calls itself on smaller subproblems as needed, until the calculation is done. Potentially performing many redundant calculations naively.

With memoization, you form a cache of these recursive calls, and check if the same one is called again before recalculating. Still potential danger since do not plan memory needs or usage.

With Dynamic Programming, you implement a bottum-up plan to produce the cache of results for smaller subproblems.