Classes and OOP¶

  • OOP-like programming without classes
  • Structs
  • Classes
  • Dunder methods
  • Inheritance

Motivation¶

Generally, classes are a way to bundle together variables and functions.

This can then be used for operating on some set of internal states like a machine.

Also classes can be repackaged into new classes (polymorphism)

Example: a class for performing statistics calculations on a data file.

Rather than make multiple different functions for each statistic you want, where each function must load the data and check for validity. You have one "startup function" that loads and checks the data, and maintains some or all of it interally. Then subsequent functions can use these internals.

"I prefer (math-like) functional programming versus OOP"¶

Generally in Data Science you can get by almost never using OOP

But you still need it to understand what's going on with python (e.g., everything is a class even ints).

Major external libraries (numpy, scipy, pytorch) all are class-based as well.

Many will require you to use advanced OOP features to do specialized tasks, like using your own custom model with the module's automated parallelism or optimization features. This requies you use polymorphism to extend the original module's class.

OOP-like programming manually¶

Classes are only a convenience that allows you to easily do things you could already do other ways.

Consider the stats example. Implement it manually.

Define functions for a dataset which is a list of numbers:

  • checking data is not empty and has at least 2 values
  • computing population variance
  • computing standard deviation
  • nicely printing the mean, std, and length of a dataset if valid, or printing a massage if not valid
In [11]:
# original functions

import math

def check_data(data):
    if not data:
        return False
    if len(data) < 2:
        return False
    return True

def var(data):
    m = mean(data)
    return sum((x - m)**2 for x in data) / len(data)

def std(data):
    return math.sqrt(var(data))

def print_summary(data):
    if check_data(data):
        print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', std(data))
    else:
        print('data not valid')
        
print_summary([1,2,3,4])
data length:  4 
mean:  2.5 
std: 1.118033988749895

Quasi-OOP example¶

Collecting functions together with a naming convention

Helps avoid potential conflucts with same named used elsewhere. E.g. if we defined our own mean function.

In [13]:
import math

def Stats_check_data(data):
    if not data:
        return False
    if len(data) < 2:
        return False
    return True

def Stats_var(data):
    m = mean(data)
    return sum((x - m)**2 for x in data) / len(data)

def Stats_std(data):
    return math.sqrt(Stats_var(data))

def Stats_print_summary(data):
    if Stats_check_data(data):
        print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', Stats_std(data))
    else:
        print('data not valid')
        
Stats_print_summary([1,2,3,4])
data length:  4 
mean:  2.5 
std: 1.118033988749895

Maintaining internal data¶

Suppose you wanted to also maintain the internal data and intermediate computed values to reuse in other functions.

You could do the same kind of thing. E.g. create globals Stats_computed_std and State_data_is_valid to share

In [18]:
import math

def Stats_check_data(data):
    if not data:
        return False
    if len(data) < 2:
        return False
    return True

def Stats_var(data):
    m = mean(data)
    return sum((x - m)**2 for x in data) / len(data)

def Stats_std(data):
    return math.sqrt(Stats_var(data))

def Stats_print_summary(data, is_valid, computed_std):
    if is_valid:
        print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', computed_std)
    else:
        print('data not valid')
        
data = [1,2,3,4]

Stats_data_is_valid = Stats_check_data(data)
Stats_computed_std = Stats_std(data)

Stats_print_summary(data, Stats_data_is_valid, Stats_computed_std)
data length:  4 
mean:  2.5 
std: 1.118033988749895

'Structs'¶

A struct is general data structure that can hold different variables.

In C (which does not support OOP) it is basically a class that can only hold data ("attributes").

In python we'd make a class but only put data in it. Access using the attribute access operator "." :

In [24]:
# use keyword class following by choice of name 
class Info:
    course = 'python'
    day = 'Tuesday'
    
# above is just a "template" for the class. actually construct one as follows (can name it anything you want)
info = Info()
info.course
Out[24]:
'python'
In [8]:
info
Out[8]:
<__main__.Info at 0x1b1bc080590>

Note you can add attributes at any time to chage the contents

In [25]:
info.morestuff = 5

info.morestuff
Out[25]:
5
In [22]:
# using struct for our internal data

import math

def Stats_check_data(data):
    if not data:
        return False
    if len(data) < 2:
        return False
    return True

def Stats_var(data):
    m = mean(data)
    return sum((x - m)**2 for x in data) / len(data)

def Stats_std(data):
    return math.sqrt(Stats_var(data))

def Stats_print_summary(data, is_valid, computed_std):
    if is_valid:
        print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', computed_std)
    else:
        print('data not valid')

data = [1,2,3,4]

# define the data-containing struct
class Stats:
    data_is_valid = False
    computed_std = 0

# create an instance
stats = Stats() 

stats.data_is_valid = Stats_check_data(data)
stats.computed_std = Stats_std(data)

Stats_print_summary(data, stats.data_is_valid, stats.computed_std)
data length:  4 
mean:  2.5 
std: 1.118033988749895

Classes¶

Think of as structs with internal functions as well as data

In python you define a class by using the keyword "class" followed by a name you choose.

After that you can state what internal functions or variables it has.

This just defines a class, similar to defining a function. Again, nothing happens until it is instantiated.

In [29]:
class Myclass:
    def donothing(self):
        return

Instantiating a class creates an object (an instance). Notice we called the class name like a function. This is called its constructor function.

It may not do any operations yet, just make it exist in memory like a new variable.

In [30]:
mc = Myclass()
print(type(mc))
<class '__main__.Myclass'>

Functions operate on interal data via a self reference that we must explicitly include in the definition (but not in instances)

In [34]:
class Myclass:
    x = 5
    def printx(self): # self must be first argument
        print(self.x)

mc = Myclass()
mc.printx() # <-- note not passing self in use
5
In [35]:
mc.x
Out[35]:
5

as noted above, self is a reference not a copy of the class

In [43]:
class Myclass:
    x = 5 # just the initial value
    def printx(self): # self must be first argument
        print(self.x)
    def increment(self): # self must be first argument
        self.x = self.x + 1
    def add(self, y): # self must be first argument
        self.x = self.x + y
        
mc = Myclass()
mc.printx() 
mc.increment()
mc.increment()
mc.printx() 
mc.add(10) # <- 2nd argument in class definition is now first argument
mc.printx() 
5
7
17
In [44]:
mc2 = Myclass() # new instance starts over
mc2.printx()
5

The constructor __init__ initializes instance state at creation time.

In [38]:
class Myclass:
    def __init__(self, x0=0):
        self.x = x0
    def printx(self): # self must be first argument
        print(self.x)
    def increment(self): # self must be first argument
        self.x = self.x + 1
In [39]:
mc2 = Myclass(123)
mc2.printx()
123

Attributes are public by default.

Underscores are conventions to indicate should be treated as "protected".

Double underscores trigger name mangling for enforcing private.

In [41]:
class Example:
    def __init__(self):
        self.public = 1
        self._internal = 2 
        self.__mangled = 3
        
e = Example()
print(e.public)          # 1
print(e._internal)       # 2
print(e.__mangled) # would fail
1
2
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_1400\1456861399.py in <module>
      8 print(e.public)          # 1
      9 print(e._internal)       # 2
---> 10 print(e.__mangled) # would fail

AttributeError: 'Example' object has no attribute '__mangled'

Convention is to use "getters" to access private members¶

This allows module internals to be changed without breaking code that is over-specialized toa previous implementation

In [46]:
class Example:
    def __init__(self):
        self.__x = 3
    def get(self):
        return self.__x

ex = Example()
ex.get()
Out[46]:
3
In [47]:
ex.__x
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_1400\901071837.py in <module>
----> 1 ex.__x

AttributeError: 'Example' object has no attribute '__x'

Dunder methods (“double underscore”)¶

"hooks" into Python’s core language semantics

A dunder method is a specially named method of the form __name__. Python calls these methods implicitly in response to syntax or built-in operations

For example, __init__ is called automatically when a class is created ("construction")

Object construction and representation are controlled by dunder methods.

class Point:
    # constructor. called by Point()
    def __init__(self, x, y):
        self.x = x
        self.y = y

    # string representation for printing   
    def __repr__(self):
        return f"Point({self.x}, {self.y})"

p = Point(1, 2)
print(p)          # Point(1, 2)
In [32]:
# these are built-in for classes we already know

a = list([1,2,3])
print(a)
[1, 2, 3]
In [28]:
a.__repr__()
Out[28]:
'[1, 2, 3]'
In [31]:
# dir(a)

Arithmetic and comparison operators

class Vector:
    def __init__(self, x):
        self.x = x

    def __add__(self, other):
        return Vector(self.x + other.x)

    def __eq__(self, other):
        return self.x == other.x

v1 = Vector(2)
v2 = Vector(3)

print(v1 + v2)    # Vector(5)
print(v1 == v2)   # False

Container behavior

class Bag:
    def __init__(self, items):
        self.items = items

    def __len__(self):
        return len(self.items)

    def __contains__(self, item):
        return item in self.items

b = Bag([1, 2, 3])
print(len(b))     # 3
print(2 in b)     # True

Iteration protocol based on iter and next member functions

class CountDown:
    def __init__(self, n):
        self.n = n

    def __iter__(self):
        return self

    def __next__(self):
        if self.n <= 0:
            raise StopIteration
        self.n -= 1
        return self.n + 1

for x in CountDown(3):
    print(x)

Function-like behavior explicitly defined

class Squarer:
    def __call__(self, x):
        return x * x

f = Squarer()
print(f(4))       # 16

Context manager with

class Demo:
    def __enter__(self):
        print("enter")
        return self

    def __exit__(self, exc_type, exc, tb):
        print("exit")

with Demo():
    pass

Dunder summary¶

  • Dunder methods are interfaces to python interpreter, define how objects interact with syntax.
  • You only implement most of them if you want to use native-like behavior.

If your object should behave like a number, container, function, iterator, context manager...

Advanced: Inheritance¶

Inheritance allows reuse and specialization of classes.

In [49]:
# parent class
class Animal:
    def speak(self):
        return "sound"

# child class
class Dog(Animal):
    def speak(self):
        return "bark"

d = Dog()
print(d.speak())   # bark
bark

super()¶

delegates function call to parent

In [50]:
class LoggedDog(Dog):
    def speak(self):
        base = super().speak()
        return f"Dog says {base}"

ld = LoggedDog()
ld.speak()
Out[50]:
'Dog says bark'

Yet More Advanced: Multiple inheritance¶

Creating classes that inherit from multiple parents.

class ChildClass(TopParent, NextParent, LastParent):

E.g. start with the job you want to perform:

Create enhanced version of the original class by adding logging and timing tools:

In [19]:
class Processor:
    def process(self, x):
        return x * 2
        
class Logged:
    def process(self, x):
        print("processing", x)
        return super().process(x)
import time

class Timed:
    def process(self, x):
        start = time.time()
        result = super().process(x)
        print("elapsed:", time.time() - start)
        return result
In [20]:
class LoggedTimedProcessor(Logged, Timed, Processor):
    pass

p = LoggedTimedProcessor()
print(p.process(10))
processing 10
elapsed: 2.384185791015625e-06
20
In [21]:
class LoggedTimedProcessor2(Processor, Logged, Timed):
    pass

p = LoggedTimedProcessor2()
print(p.process(10))
20
In [22]:
LoggedTimedProcessor.mro() # method resolution order
Out[22]:
[__main__.LoggedTimedProcessor,
 __main__.Logged,
 __main__.Timed,
 __main__.Processor,
 object]
In [23]:
LoggedTimedProcessor2.mro()
Out[23]:
[__main__.LoggedTimedProcessor2,
 __main__.Processor,
 __main__.Logged,
 __main__.Timed,
 object]