Programming

Contents

Programming#

Learning objectives#

  • Review the essential concepts of programming: variables, flow control, data structures, etc.

  • Demonstrate their implementation in the Python programming language.

  • Discover what is a Jupyter notebook.

  • Learn about some good practices for Python-based software development.

Environment setup#

%pip install papermill
Hide code cell output
Requirement already satisfied: papermill in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (2.4.0)
Requirement already satisfied: ansiwrap in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (0.8.4)
Requirement already satisfied: click in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (8.1.7)
Requirement already satisfied: pyyaml in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (6.0.1)
Requirement already satisfied: nbformat>=5.1.2 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (5.9.2)
Requirement already satisfied: nbclient>=0.2.0 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (0.7.4)
Requirement already satisfied: tqdm>=4.32.2 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (4.66.1)
Requirement already satisfied: requests in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (2.31.0)
Requirement already satisfied: entrypoints in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (0.4)
Requirement already satisfied: tenacity in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from papermill) (8.2.3)
Requirement already satisfied: jupyter-client>=6.1.12 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from nbclient>=0.2.0->papermill) (7.4.9)
Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from nbclient>=0.2.0->papermill) (5.3.1)
Requirement already satisfied: traitlets>=5.3 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from nbclient>=0.2.0->papermill) (5.9.0)
Requirement already satisfied: fastjsonschema in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from nbformat>=5.1.2->papermill) (2.18.0)
Requirement already satisfied: jsonschema>=2.6 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from nbformat>=5.1.2->papermill) (4.19.0)
Requirement already satisfied: textwrap3>=0.9.2 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from ansiwrap->papermill) (0.9.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from requests->papermill) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from requests->papermill) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from requests->papermill) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from requests->papermill) (2023.7.22)
Requirement already satisfied: attrs>=22.2.0 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jsonschema>=2.6->nbformat>=5.1.2->papermill) (23.1.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jsonschema>=2.6->nbformat>=5.1.2->papermill) (2023.7.1)
Requirement already satisfied: referencing>=0.28.4 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jsonschema>=2.6->nbformat>=5.1.2->papermill) (0.30.2)
Requirement already satisfied: rpds-py>=0.7.1 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jsonschema>=2.6->nbformat>=5.1.2->papermill) (0.10.2)
Requirement already satisfied: nest-asyncio>=1.5.4 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jupyter-client>=6.1.12->nbclient>=0.2.0->papermill) (1.5.7)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jupyter-client>=6.1.12->nbclient>=0.2.0->papermill) (2.8.2)
Requirement already satisfied: pyzmq>=23.0 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jupyter-client>=6.1.12->nbclient>=0.2.0->papermill) (24.0.1)
Requirement already satisfied: tornado>=6.2 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jupyter-client>=6.1.12->nbclient>=0.2.0->papermill) (6.3.3)
Requirement already satisfied: platformdirs>=2.5 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from jupyter-core!=5.0.*,>=4.12->nbclient>=0.2.0->papermill) (3.10.0)
Requirement already satisfied: six>=1.5 in /Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.12->nbclient>=0.2.0->papermill) (1.16.0)

[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: pip install --upgrade pip
# Relax some linting rules not needed here
# pylint: disable=invalid-name,redefined-outer-name,consider-using-f-string,duplicate-value,unnecessary-lambda-assignment,protected-access,too-few-public-methods,wrong-import-position,unused-import,consider-swap-variables,consider-using-enumerate,too-many-lines

import platform
import os

print(f"Python version: {platform.python_version()}")
Python version: 3.11.1

The Python language#

Python in a nutshell#

Python is a multi-purpose programming language created in 1989 by Guido van Rossum and developed under a open source license.

It has the following characteristics:

  • multi-paradigms (procedural, fonctional, object-oriented);

  • dynamic types;

  • automatic memory management;

  • and much more!

Despite its maturity, Python is still evolving, with many nice features added regularly (What’s in which Python)

The Python syntax#

For many more examples, see the cheatsheet below.

def hello(name):
    """Say hello to someone"""

    print(f"Hello, {name}")


friends = ["Lou", "David", "Iggy"]

for friend in friends:
    hello(friend)
Hello, Lou
Hello, David
Hello, Iggy

A prominent language#

Python has become the language of choice for artificial intelligence and Data Science, for the following reasons:

  • language qualities (ease of use, simplicity, versatility);

  • involvement of the scientific and academical communities;

  • rich ecosystem of dedicated open source libraries.

The Jupyter Notebook format#

The Jupyter Notebook is an open-source web application for creating and managing documents (called notebooks) that may contain executable code, equations, visualizations and text.

A notebook file has an .ipynb extension. It contains blocks of text called cells, written in either code or Markdown. Notebooks have become a standard for experimenting and sharing results in many scientific fields.

IPython

Running Python code#

Python code can be run either:

  • locally, after installing a Python environment. Anaconda, a scientific distribution including many (1500+) specialized packages, is the easiest way to setup a work environment for AI with Python.

  • in the cloud, using an online service for executing raw Python code or Jupyter notebooks. For example, Google Colaboratory offers free access to specialized processors.

(Yet another) Python cheatsheet#

Inspired by A Whirlwind Tour of Python and another Python Cheatsheet.

Basics#

# Print statement
print("Hello World!")

# Optional separator
print(1, 2, 3)
print(1, 2, 3, sep="--")

# Variables (dynamically typed)
mood = "happy"  # or 'happy'

print("I'm", mood)
Hello World!
1 2 3
1--2--3
I'm happy

String formatting#

name = "Garance"
age = 16

# Original language syntax
message = "My name is %s and I'm %s years old." % (
    name,
    age,
)
print(message)

# Python 2.6+
message = "My name is {} and I'm {} years old.".format(name, age)
print(message)

# f-string (Python 3.6+)
# https://realpython.com/python-f-strings/
# https://cito.github.io/blog/f-strings/
message = f"My name is {name} and I'm {age} years old."
print(message)
My name is Garance and I'm 16 years old.
My name is Garance and I'm 16 years old.
My name is Garance and I'm 16 years old.

Numbers and arithmetic#

# Type: int
a = 0

# Type: float
b = 3.14

# Variable swapping
a, b = b, a
print(a, b)

# Float and integer divisions
print(13 / 2)
print(13 // 2)

# Exponential operator
print(3**2)
print(2**3)
3.14 0
6.5
6
9
8

Flow control#

The if/elif/else statement#

name = "Bob"
age = 30
if name == "Alice":
    print("Hi, Alice.")
elif age < 12:
    print("You are not Alice, kiddo.")
else:
    print("You are neither Alice nor a little kid.")
You are neither Alice nor a little kid.

The while loop#

num = 1

while num <= 10:
    print(num)
    num += 1
1
2
3
4
5
6
7
8
9
10

The for/else loop#

The optional elsestatement is only useful when a break condition can occur in the loop.

for i in [1, 2, 3, 4, 5]:
    if i == 3:
        print(i)
        break
else:
    print("No item of the list is equal to 3")
3

Data structures#

Lists#

countries = ["France", "Belgium", "India"]

print(len(countries))
print(countries[0])
print(countries[-1])

# Add element at end of list
countries.append("Ecuador")

print(countries)
3
France
India
['France', 'Belgium', 'India', 'Ecuador']

List indexing and slicing#

print(countries[1:3])
print(countries[0:-1])
print(countries[:2])
print(countries[1:])
print(countries[:])
print(countries[::-1])
['Belgium', 'India']
['France', 'Belgium', 'India']
['France', 'Belgium']
['Belgium', 'India', 'Ecuador']
['France', 'Belgium', 'India', 'Ecuador']
['Ecuador', 'India', 'Belgium', 'France']

Tuples#

Contrary to lists, tuples are immutable (read-only).

eggs = ("hello", 42, 0.5)

print(eggs[0])
print(eggs[1:3])

# TypeError: a tuple is immutable
# eggs[0] = "bonjour"
hello
(42, 0.5)

Dictionaries#

numbers = {"one": 1, "two": 2, "three": 3}

numbers["ninety"] = 90
print(numbers)

for key, value in numbers.items():
    print(f"{key} => {value}")
{'one': 1, 'two': 2, 'three': 3, 'ninety': 90}
one => 1
two => 2
three => 3
ninety => 90

Sets#

A set is an unordered collection of unique items.

# Duplicate values are automatically removed
s = {1, 2, 3, 2, 3, 4}
print(s)
{1, 2, 3, 4}

Union, intersection and difference of sets#

primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

print(primes | odds)
print(primes & odds)
print(primes - odds)
{1, 2, 3, 5, 7, 9}
{3, 5, 7}
{2}

Functions#

Function definition#

def square(x):
    """Returns the square of x"""

    return x**2

Function call#

# Print function docstring
help(square)

print(square(0))
print(square(3))
Help on function square in module __main__:

square(x)
    Returns the square of x

0
9

Default parameter values#

def fibonacci(n, a=0, b=1):
    """Returns a list of the n first Fibonacci numbers"""

    l = []
    while len(l) < n:
        a, b = b, a + b
        l.append(a)
    return l


print(fibonacci(7))
[1, 1, 2, 3, 5, 8, 13]

Flexible function arguments#

def catch_all_args(*args, **kwargs):
    """Demonstrates the use of *args and **kwargs"""

    print(f"args = {args}")
    print(f"kwargs = {kwargs}")


catch_all_args(1, 2, 3, a=10, b="hello")
args = (1, 2, 3)
kwargs = {'a': 10, 'b': 'hello'}

Lambda (anonymous) functions#

add = lambda x, y: x + y

print(add(1, 2))
3

Iterators#

A unified interface for iterating#

for element in [1, 2, 3]:
    print(element)
for element in (4, 5, 6):
    print(element)
for key in {"one": 1, "two": 2}:
    print(key)
for char in "ABC":
    print(char)
1
2
3
4
5
6
one
two
A
B
C

Under the hood#

  • An iterable is a object that has an __iter__ method which returns an iterator to provide iteration support.

  • An iterator is an object with a __next__ method which returns the next iteration element.

  • A sequence is an iterable which supports access by integer position. Lists, tuples, strings and range objects are examples of sequences.

  • A mapping is an iterable which supports access via keys. Dictionaries are examples of mappings.

  • Iterators are used implicitly by many looping constructs.

The range() function#

It doesn’t return a list, but a range object (which exposes an iterator).

for i in range(10):
    if i % 2 == 0:
        print(f"{i} is even")
    else:
        print(f"{i} is odd")
0 is even
1 is odd
2 is even
3 is odd
4 is even
5 is odd
6 is even
7 is odd
8 is even
9 is odd
for i in range(0, 10, 2):
    print(i)
0
2
4
6
8
for i in range(5, -1, -1):
    print(i)
5
4
3
2
1
0

The enumerate() function#

supplies = ["pens", "staplers", "flame-throwers", "binders"]

for i, supply in enumerate(supplies):
    print(f"Index {i} in supplies is: {supply}")
Index 0 in supplies is: pens
Index 1 in supplies is: staplers
Index 2 in supplies is: flame-throwers
Index 3 in supplies is: binders

Comprehensions#

Principle#

  • Provide a concise way to create sequences.

  • General syntax: [expr for var in iterable].

List comprehensions#

# https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

# Using explicit code
squared_numbers = []
stop = 10

for n in range(stop):
    squared_numbers.append(n**2)

print(squared_numbers)

# Using a list comprehension
print([n**2 for n in range(stop)])
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Set and dictionary comprehensions#

# Create an uppercase set
s = {"abc", "def"}
print({e.upper() for e in s})

# Obtains modulos of 4 (eliminating duplicates)
print({a % 4 for a in range(1000)})

# Switch keys and values
d = {"name": "Prosper", "age": 12}
print({v: k for k, v in d.items()})
{'DEF', 'ABC'}
{0, 1, 2, 3}
{'Prosper': 'name', 12: 'age'}

Generators#

Principle#

  • A generator defines a recipe for producing values.

  • A generator does not actually compute the values until they are needed.

  • It exposes an iterator interface. As such, it is a basic form of iterable.

  • It can only be iterated once.

Generators expressions#

They use parentheses, not square brackets like list comprehensions.

g1 = (n**2 for n in range(stop))

print(list(g1))
print(list(g1))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[]

Generator functions#

  • A function that, rather than using return to return a value once, uses yield to yield a (potentially infinite) sequence of values.

  • Useful when the generator algorithm gets complicated.

def gen():
    """Generates squared numbers"""

    for n in range(stop):
        yield n**2


g2 = gen()
print(list(g2))
print(list(g2))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[]

Object-oriented programming#

Class definition#

# https://docs.python.org/3/tutorial/classes.html


class Account:
    """Represents a bank account"""

    def __init__(self, initial_balance):
        self.balance = initial_balance

    def credit(self, amount):
        """Credits money to the account"""

        self.balance += amount

    def __str__(self):
        return f"Account balance: {self.balance}"

Class instanciation#

new_account = Account(100)
print(new_account)

new_account.credit(-40)
print(new_account.balance)
Account balance: 100
60

Instance properties#

# https://docs.python.org/3/library/functions.html#property


class Vehicle:
    """Represents a vehicle"""

    def __init__(self, number_of_wheels, type_of_tank):
        # The leading underscore designates internal ("private") attributes
        self._number_of_wheels = number_of_wheels
        self._type_of_tank = type_of_tank

    @property
    def number_of_wheels(self):
        """Number of wheels"""

        return self._number_of_wheels

    @number_of_wheels.setter
    def number_of_wheels(self, number):
        self._number_of_wheels = number

Using instance properties#

my_strange_vehicle = Vehicle(4, "electric")
my_strange_vehicle.number_of_wheels = 2
print(my_strange_vehicle.number_of_wheels)

# Works, but frowned upon (accessing a private attribute)
# We should use a property instead
print(my_strange_vehicle._type_of_tank)
2
electric

Class attributes#

class Employee:
    """Represents an employee"""

    empCount = 0  # Class-level attribute, shared by all instances

    def __init__(self, name, salary):
        self._name = name
        self._salary = salary
        Employee.empCount += 1

    @staticmethod
    def count():
        """Count the number of employees"""

        return f"Total employees: {Employee.empCount}"


e1 = Employee("Ben", "30")
print(Employee.count())
Total employees: 1

Inheritance#

class Animal:
    """Represents an animal"""

    def __init__(self, species):
        self.species = species


class Dog(Animal):
    """Represents a specific animal: a dog"""

    def __init__(self, name):
        Animal.__init__(self, "Mammal")
        self.name = name


doggo = Dog("Fang")
print(doggo.name)
print(doggo.species)
Fang
Mammal

Dataclasses#

Simplified syntax for attribute-centric classes.

# https://realpython.com/python-data-classes/

from dataclasses import dataclass


@dataclass
class Student:
    """Represents a student"""

    name: str
    id: int


student = Student("Jack", 123456)
print(student)
Student(name='Jack', id=123456)

Modules and packages#

# Importing all module content into a namespace
import math

print(math.cos(math.pi))  # -1.0

# Importing specific module content into local namespace
from math import cos, pi

print(cos(pi))  # -1.0

# Aliasing an import
import numpy as np

print(np.cos(np.pi))  # -1.0
-1.0
-1.0
-1.0

Python good practices#

Writing pythonic code#

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

What does “Pythonic” mean?#

  • Python code is considered pythonic if it:

    • conforms to the Python philosophy;

    • takes advantage of the language’s specific features.

  • Pythonic code is nothing more than idiomatic Python code that strives to be clean, concise and readable.

Most linting tools (see below) enforce pythonicity.

Example: swapping two variables#

a = 3
b = 2

# Non-pythonic
tmp = a
a = b
b = tmp
print(a, b)

# Pythonic
a, b = b, a
print(a, b)
2 3
3 2

Example: iterating on a list#

my_list = ["a", "b", "c"]


# Non-pythonic
i = 0
while i < len(my_list):
    print(my_list[i])
    i += 1

# Still non-pythonic
for i in range(len(my_list)):
    print(my_list[i])

# Pythonic
for item in my_list:
    print(item)
a
b
c
a
b
c
a
b
c

Example: indexed traversal#

my_list = ["a", "b", "c"]

# Non-pythonic
for i in range(len(my_list)):
    print(i, "->", my_list[i])

# Pythonic
for i, item in enumerate(my_list):
    print(i, "->", item)
0 -> a
1 -> b
2 -> c
0 -> a
1 -> b
2 -> c

Example: searching in a list#

fruits = ["apples", "oranges", "bananas", "grapes"]
fruit = "cherries"

# Non-pythonic
found = False
size = len(fruits)
for i in range(0, size):
    if fruits[i] == fruit:
        found = True
print(found)

# Pythonic
found = fruit in fruits
print(found)
False
False

Example: generating a list#

numbers = [1, 2, 3, 4, 5, 6]

# Non-pythonic
doubles = []
for i in range(len(numbers)):
    if numbers[i] % 2 == 0:
        doubles.append(numbers[i] * 2)
    else:
        doubles.append(numbers[i])
print(doubles)

# Pythonic
doubles = [x * 2 if x % 2 == 0 else x for x in numbers]
print(doubles)
[1, 4, 3, 8, 5, 12]
[1, 4, 3, 8, 5, 12]

Code style#

  • PEP8 is the official style guide for Python:

    • use 4 spaces for indentation;

    • define a maximum value for line length (around 80 characters);

    • organize imports at beginning of file;

    • surround binary operators with a single space on each side;

  • Code style should be enforced upon creation by a tool like Black.

Beyond PEP8#

Focusing on style and PEP8-compliance might make you miss more fundamental code flaws.

from IPython.display import YouTubeVideo

YouTubeVideo("wf-BqAjZb8M")

Docstrings#

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition to document it.

All modules, classes, public methods and exported functions should include a docstring.

def create_complex(real=0.0, imag=0.0):
    """Form a complex number.

    Keyword arguments:
    real -- the real part (default 0.0)
    imag -- the imaginary part (default 0.0)
    """

    # ... (do something useful with parameters)
    _ = real, imag

Code linting#

  • Linting is the process of checking code for syntactical and stylistic problems before execution.

  • It is useful to catch errors and improve code quality in dynamically typed, interpreted languages, where there is no compiler.

  • Several linters exist in the Python ecosystem. A popular choice is Pylint.

Type annotations#

  • Added in Python 3.5, type annotations allow to add type hints to code entities like variables or functions, bringing a statically typed flavour to the language.

  • mypy can automatically check the code for annotation correctness.

def greeting(name: str) -> str:
    """Greet someone"""

    return "Hello " + name


greeting("Alice")  # OK

# greeting(3)  # mypy error: incompatible type "int"; expected "str"
'Hello Alice'

Unit tests#

Unit tests automate the testing of individual code elements like functions or methods, thus decreasing the risk of bugs and regressions.

They can be implemented in Python using tools like unittest or pytest.

def inc(x):
    """Increment a value"""

    return x + 1


assert inc(3) == 4

# assert inc(3) == 5  # AssertionError

Packaging and dependency management#

Managing dependencies in Python#

  • Most Python apps depend on third-party libraries and frameworks (NumPy, Flask, Requests…).

  • These tools may also have external dependencies, and so on.

  • Dependency management is necessary to prevent version conflicts and incompatibilities. it involves two things:

    • a way for the app to declare its dependencies;

    • a tool to resolve these dependencies and install compatible versions.

Semantic versioning#

  • Software versioning convention used in many ecosystems.

  • A version number comes as a suite of three digits X.Y.Z.

    • X = major version (potentially including breaking changes).

    • Y = minor version (only non-breaking changes).

    • Z = patch.

  • Digits are incremented as new versions are shipped.

pip and requirements.txt#

A requirements.txt file is the most basic way of declaring dependencies in Python.

certifi>=2020.11.0
chardet==4.0.0
click>=6.5.0, <7.1
download==0.3.5
Flask>=1.1.0

The pip package installer can read this file and act accordingly, downloading dependencies from PyPI.

pip install -r requirements.txt

Virtual environments#

  • A virtual environment is an isolated Python environment where a project’s dependencies are installed.

  • Using them prevents the risk of mixing dependencies required by different projects on the same machine.

  • Several tools exist to manage virtual environments in Python, for example virtualenv and conda.

conda and environment.yml#

Installed as part of the Anaconda distribution, the conda package manager reads an environment.yml file to install the dependencies associated to a specific virtual environment.

name: example-env

channels:
  - conda-forge
  - defaults

dependencies:
  - python=3.7
  - matplotlib
  - numpy

Poetry#

Poetry is a recent packaging and dependency management tool for Python. It downloads packages from PyPI by default.

# Create a new poetry-compliant project in the my-project folder
poetry new my-project

# Initialize an already existing project for Poetry
poetry init

# Install defined dependencies
poetry install

# Add a package to project dependencies and install it
poetry add <package name>

# Update dependencies to sync them with configuration file
poetry update

Poetry and virtual environments#

By default, Poetry creates a virtual environment for the configured project in a user-specific folder. A standard practice is to store it in the project’s folder.

# Tell Poetry to store the environment in the local project folder
poetry config virtualenvs.in-project true

# Activate the environment
poetry shell

The pyproject.toml file#

Poetry configuration file, soon-to-be standard for Python projects.

[tool.poetry]
name = "poetry example"
version = "0.1.0"
description = ""

[tool.poetry.dependencies]
python = ">=3.7.1,<3.10"
jupyter = "^1.0.0"
matplotlib = "^3.3.2"
sklearn = "^0.0"
pandas = "^1.1.3"
ipython = "^7.0.0"

[tool.poetry.dev-dependencies]
pytest = "^6.1.1"

Caret requirements#

Offers a way to precisely define dependency versions.

Requirement

Versions allowed

^1.2.3

>=1.2.3 <2.0.0

^1.2

>=1.2.0 <2.0.0

~1.2.3

>=1.2.3 <1.3.0

~1.2

>=1.2.0 <1.3.0

1.2.3

1.2.3 only

The poetry.lock file#

  • The first time Poetry install dependencies, it creates a poetry.lock file that contains the exact versions of all installed packages.

  • Subsequent installs will use these exact versions to ensure consistency.

  • Removing this file and running another Poetry install will fetch the latest matching versions.

Working with notebooks#

Advantages of Jupyter notebooks#

  • Standard format for mixing text, images and (executable) code.

  • Open source and platform-independant.

  • Useful for experimenting and prototyping.

  • Growing ecosystem of extensions for various purposes and cloud hosting solutions (Colaboratory, AI notebooks…).

  • Integration with tools like Visual Studio Code.

Drawbacks of Jupyter notebooks#

  • Arbitrary execution order of cells can cause confusing errors.

  • Notebooks don’t encourage good programming habits like modularization, linting and tests.

  • Being JSON-based, their versioning is more difficult than for plain text files.

  • Dependency management is also difficult, thus hindering reproducibility.

Collaborating with notebooks#

A common solution for sharing notebooks between a team is to use Jupytext. This tool can associate an .ipynb file with a Python file to facilitate collaboration and version control.

Code organization#

Monolithic notebooks can grow over time and become hard to understand and maintain.

Just like in a traditional software project, it is possible to split them into separate parts, thus following the separation of concerns design principle.

Code can be splitted into several sub-notebooks and/or external Python files. The latter facilitates unit testing and version control.

Notebook workflow#

Tools like papermill can orchestrate the execution of several notebooks in a row. External parameters can be passed to notebooks, and the runtime flow can depend on the execution results of each notebook.

import papermill as pm

# Doesn't work on Google Colaboratory. Workaround here:
# https://colab.research.google.com/github/rjdoubleu/Colab-Papermill-Patch/blob/master/Colab-Papermill-Driver.ipynb
notebook_dir = "./_papermill"
result = pm.execute_notebook(
    os.path.join(notebook_dir, "simple_input.ipynb"),
    os.path.join(notebook_dir, "simple_output.ipynb"),
    parameters={"msg": "Hello"},
)
Hide code cell output
Black is not installed, parameters wont be formatted
/Users/baptiste/Documents/Projets/GitHub/bpesquet/ainotes/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Executing:   0%|                                                                                                                                 | 0/4 [00:00<?, ?cell/s]0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
Executing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.60cell/s]

Additional resources#

The Little Book of Python Anti-Patterns