Generally, classes are a way to bundle together variables and functions.
This can then be used for operating on some set of internal states like a machine.
Also classes can be repackaged into new classes (polymorphism)
Example: a class for performing statistics calculations on a data file.
Rather than make multiple different functions for each statistic you want, where each function must load the data and check for validity. You have one "startup function" that loads and checks the data, and maintains some or all of it interally. Then subsequent functions can use these internals.
Generally in Data Science you can get by almost never using OOP
But you still need it to understand what's going on with python (e.g., everything is a class even ints).
Major external libraries (numpy, scipy, pytorch) all are class-based as well.
Many will require you to use advanced OOP features to do specialized tasks, like using your own custom model with the module's automated parallelism or optimization features. This requies you use polymorphism to extend the original module's class.
Classes are only a convenience that allows you to easily do things you could already do other ways.
Consider the stats example. Implement it manually.
Define functions for a dataset which is a list of numbers:
# original functions
import math
def check_data(data):
if not data:
return False
if len(data) < 2:
return False
return True
def var(data):
m = mean(data)
return sum((x - m)**2 for x in data) / len(data)
def std(data):
return math.sqrt(var(data))
def print_summary(data):
if check_data(data):
print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', std(data))
else:
print('data not valid')
print_summary([1,2,3,4])
data length: 4 mean: 2.5 std: 1.118033988749895
Collecting functions together with a naming convention
Helps avoid potential conflucts with same named used elsewhere. E.g. if we defined our own mean function.
import math
def Stats_check_data(data):
if not data:
return False
if len(data) < 2:
return False
return True
def Stats_var(data):
m = mean(data)
return sum((x - m)**2 for x in data) / len(data)
def Stats_std(data):
return math.sqrt(Stats_var(data))
def Stats_print_summary(data):
if Stats_check_data(data):
print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', Stats_std(data))
else:
print('data not valid')
Stats_print_summary([1,2,3,4])
data length: 4 mean: 2.5 std: 1.118033988749895
Suppose you wanted to also maintain the internal data and intermediate computed values to reuse in other functions.
You could do the same kind of thing. E.g. create globals Stats_computed_std and State_data_is_valid to share
import math
def Stats_check_data(data):
if not data:
return False
if len(data) < 2:
return False
return True
def Stats_var(data):
m = mean(data)
return sum((x - m)**2 for x in data) / len(data)
def Stats_std(data):
return math.sqrt(Stats_var(data))
def Stats_print_summary(data, is_valid, computed_std):
if is_valid:
print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', computed_std)
else:
print('data not valid')
data = [1,2,3,4]
Stats_data_is_valid = Stats_check_data(data)
Stats_computed_std = Stats_std(data)
Stats_print_summary(data, Stats_data_is_valid, Stats_computed_std)
data length: 4 mean: 2.5 std: 1.118033988749895
A struct is general data structure that can hold different variables.
In C (which does not support OOP) it is basically a class that can only hold data ("attributes").
In python we'd make a class but only put data in it. Access using the attribute access operator "." :
# use keyword class following by choice of name
class Info:
course = 'python'
day = 'Tuesday'
# above is just a "template" for the class. actually construct one as follows (can name it anything you want)
info = Info()
info.course
'python'
info
<__main__.Info at 0x1b1bc080590>
Note you can add attributes at any time to chage the contents
info.morestuff = 5
info.morestuff
5
# using struct for our internal data
import math
def Stats_check_data(data):
if not data:
return False
if len(data) < 2:
return False
return True
def Stats_var(data):
m = mean(data)
return sum((x - m)**2 for x in data) / len(data)
def Stats_std(data):
return math.sqrt(Stats_var(data))
def Stats_print_summary(data, is_valid, computed_std):
if is_valid:
print('data length: ',len(data),'\nmean: ', mean(data), '\nstd:', computed_std)
else:
print('data not valid')
data = [1,2,3,4]
# define the data-containing struct
class Stats:
data_is_valid = False
computed_std = 0
# create an instance
stats = Stats()
stats.data_is_valid = Stats_check_data(data)
stats.computed_std = Stats_std(data)
Stats_print_summary(data, stats.data_is_valid, stats.computed_std)
data length: 4 mean: 2.5 std: 1.118033988749895
Think of as structs with internal functions as well as data
In python you define a class by using the keyword "class" followed by a name you choose.
After that you can state what internal functions or variables it has.
This just defines a class, similar to defining a function. Again, nothing happens until it is instantiated.
class Myclass:
def donothing(self):
return
Instantiating a class creates an object (an instance). Notice we called the class name like a function. This is called its constructor function.
It may not do any operations yet, just make it exist in memory like a new variable.
mc = Myclass()
print(type(mc))
<class '__main__.Myclass'>
Functions operate on interal data via a self reference that we must explicitly include in the definition (but not in instances)
class Myclass:
x = 5
def printx(self): # self must be first argument
print(self.x)
mc = Myclass()
mc.printx() # <-- note not passing self in use
5
mc.x
5
as noted above, self is a reference not a copy of the class
class Myclass:
x = 5 # just the initial value
def printx(self): # self must be first argument
print(self.x)
def increment(self): # self must be first argument
self.x = self.x + 1
def add(self, y): # self must be first argument
self.x = self.x + y
mc = Myclass()
mc.printx()
mc.increment()
mc.increment()
mc.printx()
mc.add(10) # <- 2nd argument in class definition is now first argument
mc.printx()
5 7 17
mc2 = Myclass() # new instance starts over
mc2.printx()
5
The constructor __init__ initializes instance state at creation time.
class Myclass:
def __init__(self, x0=0):
self.x = x0
def printx(self): # self must be first argument
print(self.x)
def increment(self): # self must be first argument
self.x = self.x + 1
mc2 = Myclass(123)
mc2.printx()
123
Attributes are public by default.
Underscores are conventions to indicate should be treated as "protected".
Double underscores trigger name mangling for enforcing private.
class Example:
def __init__(self):
self.public = 1
self._internal = 2
self.__mangled = 3
e = Example()
print(e.public) # 1
print(e._internal) # 2
print(e.__mangled) # would fail
1 2
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_1400\1456861399.py in <module> 8 print(e.public) # 1 9 print(e._internal) # 2 ---> 10 print(e.__mangled) # would fail AttributeError: 'Example' object has no attribute '__mangled'
This allows module internals to be changed without breaking code that is over-specialized toa previous implementation
class Example:
def __init__(self):
self.__x = 3
def get(self):
return self.__x
ex = Example()
ex.get()
3
ex.__x
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_1400\901071837.py in <module> ----> 1 ex.__x AttributeError: 'Example' object has no attribute '__x'
"hooks" into Python’s core language semantics
A dunder method is a specially named method of the form __name__. Python calls these methods implicitly in response to syntax or built-in operations
For example, __init__ is called automatically when a class is created ("construction")
Object construction and representation are controlled by dunder methods.
class Point:
# constructor. called by Point()
def __init__(self, x, y):
self.x = x
self.y = y
# string representation for printing
def __repr__(self):
return f"Point({self.x}, {self.y})"
p = Point(1, 2)
print(p) # Point(1, 2)
# these are built-in for classes we already know
a = list([1,2,3])
print(a)
[1, 2, 3]
a.__repr__()
'[1, 2, 3]'
# dir(a)
Arithmetic and comparison operators
class Vector:
def __init__(self, x):
self.x = x
def __add__(self, other):
return Vector(self.x + other.x)
def __eq__(self, other):
return self.x == other.x
v1 = Vector(2)
v2 = Vector(3)
print(v1 + v2) # Vector(5)
print(v1 == v2) # False
Container behavior
class Bag:
def __init__(self, items):
self.items = items
def __len__(self):
return len(self.items)
def __contains__(self, item):
return item in self.items
b = Bag([1, 2, 3])
print(len(b)) # 3
print(2 in b) # True
Iteration protocol based on iter and next member functions
class CountDown:
def __init__(self, n):
self.n = n
def __iter__(self):
return self
def __next__(self):
if self.n <= 0:
raise StopIteration
self.n -= 1
return self.n + 1
for x in CountDown(3):
print(x)
Function-like behavior explicitly defined
class Squarer:
def __call__(self, x):
return x * x
f = Squarer()
print(f(4)) # 16
Context manager with
class Demo:
def __enter__(self):
print("enter")
return self
def __exit__(self, exc_type, exc, tb):
print("exit")
with Demo():
pass
If your object should behave like a number, container, function, iterator, context manager...
Inheritance allows reuse and specialization of classes.
# parent class
class Animal:
def speak(self):
return "sound"
# child class
class Dog(Animal):
def speak(self):
return "bark"
d = Dog()
print(d.speak()) # bark
bark
delegates function call to parent
class LoggedDog(Dog):
def speak(self):
base = super().speak()
return f"Dog says {base}"
ld = LoggedDog()
ld.speak()
'Dog says bark'
Creating classes that inherit from multiple parents.
class ChildClass(TopParent, NextParent, LastParent):
E.g. start with the job you want to perform:
Create enhanced version of the original class by adding logging and timing tools:
class Processor:
def process(self, x):
return x * 2
class Logged:
def process(self, x):
print("processing", x)
return super().process(x)
import time
class Timed:
def process(self, x):
start = time.time()
result = super().process(x)
print("elapsed:", time.time() - start)
return result
class LoggedTimedProcessor(Logged, Timed, Processor):
pass
p = LoggedTimedProcessor()
print(p.process(10))
processing 10 elapsed: 2.384185791015625e-06 20
class LoggedTimedProcessor2(Processor, Logged, Timed):
pass
p = LoggedTimedProcessor2()
print(p.process(10))
20
LoggedTimedProcessor.mro() # method resolution order
[__main__.LoggedTimedProcessor, __main__.Logged, __main__.Timed, __main__.Processor, object]
LoggedTimedProcessor2.mro()
[__main__.LoggedTimedProcessor2, __main__.Processor, __main__.Logged, __main__.Timed, object]