Python Language Modularization
External
- https://realpython.com/python-modules-packages
- https://docs.python.org/3/tutorial/modules.html
- https://docs.python.org/3/reference/import.html
Internal
Overview
Python code is organized in units like standalone programs, modules and packages. When a module or a package is published, people refer to it as a library. In this context, the term library is simply a generic term for a bunch of code that was designed with the aim of being reused by many applications. It provides some generic functionality that can be used by specific applications.
Standalone Program
A standalone program consists of one or more files of code that is read by the Python interpreter and executed. A typical way to interact with a Python program is command line arguments. For more details on handling command line arguments, see:
Python Script
A script is a module whose aim is to be executed. It has the same meaning as "program", standalone program, or "application", but it is usually used to describe simple and small program. It contains a stored set of instructions that can be handed over to the Python interpreter:
python3 ./my-script.py
Python scripts have .py
extensions.
The python code can be specified in-line with a here-doc:
python3 <<EOF
print('hello')
print('hello2')
EOF
The same approach can be taken when Python code needs to be executed from within a bash script, for more details see:
Module
A module is an organizational unit of Python code. It consists of one file. The module can be imported inside another Python program or executed on its own. The module can define variables, functions and classes. If intended to run on its own, the module will also include runnable code. The file name consists of the module name with the suffix .py
appended. Modules are loaded into Python by the process of importing, where the code in one module is made available to Python code in another module. The files of Python modules and packages are managed by a Python package manager like pip, Conda, Pipenv and Poetry.
Modules have a namespace containing arbitrary Python objects.
Module Name
Naming conventions are documented here: https://www.python.org/dev/peps/pep-0008/#id36: Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. The name of the module cannot contain dashes ('-'). The name of the directory the module is stored in can contain dashes.
Importing
The process of importing is loading code of a module or a package to make it available to other module.
Importing a Entire Module
The entire code of a module is imported into another module by using the import
statement, listed at the top of the file, followed by the name of the module, which is the name of the module's Python file, without the .py
extension. The file corresponding to the module being imported must be accessible, as described in Locating Module Files.
import mymodule
Multiple comma-separated module names can be specified in the same import statement, but various static analysis programs flag this as a style violation:
import mymodule, mymodule2 # The style checker flags this as a style violation
The syntax import <module-name>
imports the entire content of the module. After this import statement, everything in the imported module is available to the program that imports it. Specific constructs from an imported module, such as functions, are then referred by prefixing the name of the construct with the name of the module. This is called qualifying the contents of a module with the module's name:
import mymodule
[...]
mymodule.my_func()
Importing a Module from a Function
A module can be imported at the top of the file, or from inside a function. You should consider importing from outside the function if the imported code might be used in more than one place, and from inside if you know its use will be limited. Putting all imports at the top of the file makes all dependencies of the code explicit.
def some_func():
import mymodule
[...]
mymodule.my_func()
Since the function has its own namespace, and the chance of a collision with another identical name is non-existent, we can avoid qualifying the name of the function from the imported module with the name of the module:
def some_func():
import mymodule
[...]
my_func()
Even so, always qualifying the name of the used construct with the name of the module is a safer choice and it is considered good practice.
Importing a Module with Another Name
The constructs of an imported module are qualified with the name of the module to be used. The prefix can be changed, usually to shorten it, by using the as
reserved word in the import statement. This syntax effectively renames the module being imported. The same technique is useful if there are two modules with the same name.
import mymodule as m
[...]
m.my_func()
Importing only Specific Constructs from a Module
If not all constructs exposed by an imported module are useful, only individual constructs, such a specific function, can be imported, using the from
/import
reserved combination:
from mymodule import my_func
[...]
# invoke the function directly, without prefixing it with the name of the module
my_func()
Each part can keep its original name or it can be aliased:
from mymodule import my_func as m_f
[...]
m_f()
Locating Module Files - Module Search Path
If the file corresponding to the module being imported resides in the same directory as the file in which the module is imported into, then the import completes successfully without additional configuration. Otherwise, the runtime looks at a list of directory names and ZIP files stored in the standard sys
module as the variable path
. This list can be accessed and modified:
import sys
for i in sys.path:
print(i)
/opt/brew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python39.zip /opt/brew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9 /opt/brew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload /Users/ovidiu/my-project/venv/lib/python3.9/site-packages
The initial blank output line is the empty string '', which stands for current directory. The first match will be used. If a module with the same name as a module from standard library is encountered in the search path before the standard library, it will be used instead of the module coming from the standard library.
Extending Module Search Path
Externally
Set the environment variable PYTHONPATH
to a colon-separated list of directories to search for imported modules. A directory declared in PYTHONPATH
will be inserted at the beginning of the sys.path
list when Python starts up.
Internally from the Program
Append to sys.path
Appending an absolute path works:
import sys
sys.path.append('/path/to/search')
Appending a relative path does not work, unless the script is executed from the directory the path is relative to.
import sys
sys.path.append('./my-module') # This does not work unless the Python script is executed from "my-module"'s parent.
To use a relative directory and make append() insensitive to the location the program is run from, use this pattern:
import os
import sys
sys.path.append(os.path.dirname(__file__) + "/my-module")
import my_module
PyCharm trick: If the module name and the parent directory have the same name, PyCharm will stop issuing static analysis error "No module named ..."
Append to site.addsitedir
Another way is to use site.addsitedir
to add a directory to sys.path
. The difference between this and just plain appending is that when you use addsitedir
, it also looks for .pth
files within that directory and uses them to possibly add additional directories to sys.path
based on the contents of the files.
Package
A package is Python code stored into multiple files, organized into a file hierarchy. A package may contain multiple modules, each stored in its own file. The package may also contain a file named __init__.py
The package directory can recursively contain sub-packages. Packages allow for a hierarchical structuring of the module namespace using dot notation. In the same way that modules avoid collisions between global variable names, packages avoid collision between module names. For example, the urllib package contains several modules: urllib.request
, urllib.error
, etc. Modules and packages are managed by a Python package manager like pip, Conda, Pipenv and Poetry.
Package Name
See module name above.
Regular Package
A traditional package, such as a directory containing an __init__.py
file.
__init__.py
When a regular package is imported, this __init__.py
file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. The __init__.py
file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.
__main__.py
The file contains the code of the "main" module, which will be imported automatically when the package is imported.
Namespace Package
A PEP 420 package which serves only as a container for subpackages. Namespace packages may have no physical representation, and have no __init__.py
file.
Package Metadata
Name: pulumi
Version: 2.11.2
Summary: Pulumi's Python SDK
Home-page: https://github.com/pulumi/pulumi
Author:
Author-email:
License: Apache 2.0
Location: /Users/ovidiu/Library/Python/3.8/lib/python/site-packages
Requires: dill, grpcio, protobuf
Required-by: pulumi-aws, pulumi-kubernetes, pulumi-random, pulumi-tls
Requires
Required-by
Subpackages
Writing a Package
Package Example
A package containing multiple files is available here. A program that consumes it is available here.
Organizatorium
- Each installation of Python may have different modules installed. Python determines the path to its modules by examining the location of the
python3
executable. - Clarify the difference between importing a module and a package.
- The import system: https://docs.python.org/3/reference/import.html
- Technically, a package is a Python module with a
__path__
attribute. - Not clear yet how to define working packages. I've defined a package with an
__init__.py
and a__main__.py
with a function defined inside__main__.py
, and after import I get:
ImportError: cannot import name 'my_test_function' from 'my_package' (/Users/ovidiu/.../main/python/my_package/__init__.py)
- Packages can be run as if they were scripts if the package provides a top-level script
__main__.py
.- https://docs.python.org/3/library/__main__.html
- Understand setup.py - it defines the entry point.