Python Language Modularization: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 154: Line 154:
   /Users/ovidiu/my-project/venv/lib/python3.9/site-packages
   /Users/ovidiu/my-project/venv/lib/python3.9/site-packages
</font>
</font>
The initial blank output line is the empty string &#39;', which stands for the current directory. The first match will be used. If a module with the same name as a module from standard library is encountered in the search path before the standard library, it will be used instead of the module coming from the standard library.
The initial blank output line is the empty string &#39;', which stands for the current directory. The first match will be used. If a module with the same name as a module from standard library is encountered in the search path before the standard library, it will be used instead of the module coming from the standard library.  
 
<code>sys.path</code> value can be modified as follows:





Revision as of 23:03, 20 June 2022

External

Internal

Overview

Python code can be organized as standalone programs, modules and packages. When a module or a package is published, people refer to it as a library. In this context, the term library is simply a generic term for a bunch of code that was designed with the aim of being reused by many applications. It provides some generic functionality that can be used by specific applications.

Modular programming refers to the process of braking a large unwieldy body of code into separate, smaller, more manageable modules. Individual modules can the be combined into creating a larger application. More details about modular programming is available in Designing Modular Systems article.

Standalone Program

A standalone program consists of one or more files of code that is read by the Python interpreter and executed. A typical way to interact with a Python program is command line arguments. For more details on handling command line arguments, see:

Command Line Argument Processing in Python

Python Script

A script is a module whose aim is to be executed. It has the same meaning as "program", standalone program, or "application", but it is usually used to describe simple and small program. It contains a stored set of instructions that can be handed over to the Python interpreter:

python3 ./my-script.py

Python scripts have .py extensions.

The python code can be specified in-line with a here-doc:

python3 <<EOF
print('hello')
print('hello2')
EOF

The same approach can be taken when Python code needs to be executed from within a bash script, for more details see:

Calling Python from bash | Inline Python Code

Modules

There are three different ways to define a module in Python:

  • A module can be written in Python itself, with the code living in one code file.
  • A module can be written in C and loaded dynamically at runtime. This is the case of the regular expression re module.
  • A module can be intrinsically contained by the interpreter. This is called a built-in module.

The modules written in Python are the most common, and they are the type of modules Python developer usually write. They are exceedingly straightforward to build: just place Python code in a file with the py. A Python module consists of one file. The module can be imported inside another Python program or executed on its own. The module can define object instances such as strings or list, and that are assigned to variables, and also functions and classes. If intended to run on its own, the module will also include runnable code.

The file name is the name of the module with the suffix .py appended. Modules are loaded into Python by the process of importing, where the name of the objects present in one module are made available in the namespace of the calling layer.

Each module has its own global namespace, where all objects defined inside the module - strings, lists, functions, classes, etc. - are identified by unique names.

Module Name

Naming conventions are documented here: https://www.python.org/dev/peps/pep-0008/#id36: Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. The name of the module cannot contain dashes ('-'). The name of the directory the module is stored in can contain dashes.

Built-in Modules

The built-in modules are contained by the Python interpreter.

Importing

Importing means making accessible names available in a module or a package into the namespace associated with the code that invokes the import statement. The import operation binds the imported module's name into the namespace associated with the calling layer.

Importing a Module

The import statement, usually listed at the top of the file, followed by the name of the module, binds the name of the module into the namespace associated with the calling layer. The name of the module is the name of the file containing the module code, without the .py extension.

import mymodule

This statement does nothing except binding the specified module namespace into the current namespace:

print(globals())

[...]

'mymodule': <module 'mymodule' from '/Users/ovidiu/playground/pyhton/modules/mymodule.py'>

This allows the Python code from the current scope to use the module name to access objects contained by the module. The import statement, in this form, does not bind any of the module's objects in the current namespace. Assuming that mymodule declares a function some_func, the function can be invoked by qualifying its name with the name of the module. The function will be looked up in the mymodule global namespace, which is made accessible to the calling layer as mymodule:

import mymodule

mymodule.some_func()

The file corresponding to the module being imported must be accessible, as described in Locating Module Files.

Multiple comma-separated module names can be specified in the same import statement, but various static analysis programs flag this as a style violation:

import mymodule, mymodule2 # The style checker flags this as a style violation

The syntax import <module-name> imports the entire content of the module. After this import statement, everything in the imported module is available to the program that imports it. Specific constructs from an imported module, such as functions, are then referred by prefixing the name of the construct with the name of the module. This is called qualifying the contents of a module with the module's name:

import mymodule

[...]

mymodule.my_func()

Importing a Module from a Function

A module can be imported at the top of the file, or from inside a function. You should consider importing from outside the function if the imported code might be used in more than one place, and from inside if you know its use will be limited. Putting all imports at the top of the file makes all dependencies of the code explicit.

def some_func():
  import mymodule
  [...]
  mymodule.my_func()

Since the function has its own namespace, and the chance of a collision with another identical name is non-existent, we can avoid qualifying the name of the function from the imported module with the name of the module:

def some_func():
  import mymodule
  [...]
  my_func()

Even so, always qualifying the name of the used construct with the name of the module is a safer choice and it is considered good practice.

Importing a Module with Another Name

The constructs of an imported module are qualified with the name of the module to be used. The prefix can be changed, usually to shorten it, by using the as reserved word in the import statement. This syntax effectively renames the module being imported. The same technique is useful if there are two modules with the same name.

import mymodule as m

[...]

m.my_func()

Importing only Specific Constructs from a Module

If not all constructs exposed by an imported module are useful, only individual constructs, such a specific function, can be imported, using the from/import reserved combination:

from mymodule import my_func
[...]

# invoke the function directly, without prefixing it with the name of the module
my_func()

Each part can keep its original name or it can be aliased:

from mymodule import my_func as m_f
[...]
m_f()

Locating Module Files - Module Search Path

The runtime looks at a list of directory names and ZIP files stored in the standard sys module as the variable path (sys.path). The initial value of sys.path is assembled from the following sources:

  • The directory in which the code performing the import is located. This is why an import will aways work if the module file being imported is in the same directory as the code doing the import.
  • The current directory if the script is executed interactively.
  • The list of directories contained in the PYTHONPATH environment variable, if set.
  • An installation-dependent list of directories configured at the time Python is installed.

sys.path value can be accessed and modified:

import sys
for i in sys.path:
  print(i)

 /opt/brew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python39.zip
 /opt/brew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9
 /opt/brew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload
 /Users/ovidiu/my-project/venv/lib/python3.9/site-packages

The initial blank output line is the empty string '', which stands for the current directory. The first match will be used. If a module with the same name as a module from standard library is encountered in the search path before the standard library, it will be used instead of the module coming from the standard library.

sys.path value can be modified as follows:


Extending Module Search Path

Externally

Set the environment variable PYTHONPATH to a colon-separated list of directories to search for imported modules. A directory declared in PYTHONPATH will be inserted at the beginning of the sys.path list when Python starts up.

Internally from the Program

Append to sys.path

Appending an absolute path works:

import sys
sys.path.append('/path/to/search')

Appending a relative path does not work, unless the script is executed from the directory the path is relative to.

import sys
sys.path.append('./my-module') # This does not work unless the Python script is executed from "my-module"'s parent.

To use a relative directory and make append() insensitive to the location the program is run from, use this pattern:

import os
import sys
sys.path.append(os.path.dirname(__file__) + "/my-module")
import my_module

PyCharm trick: If the module name and the parent directory have the same name, PyCharm will stop issuing static analysis error "No module named ..."

Append to site.addsitedir

Another way is to use site.addsitedir to add a directory to sys.path. The difference between this and just plain appending is that when you use addsitedir, it also looks for .pth files within that directory and uses them to possibly add additional directories to sys.path based on the contents of the files.

Package

A package is Python code stored into multiple files, organized into a file hierarchy. A package may contain multiple modules, each stored in its own file, either in the package root directory, or recursively in subdirectories. The package root directory may also optionally contain two files named __init__.py and __main__.py.

some_dir
 └─ some_package_1
     ├─ __init__.py  # Optional
     ├─ __main__.py  # Optional
     ├─ some_module_1.py # Defines some_func_1()
     ├─ some_module_2.py # Defines some_func_2()
     └─ dir_1
         ├─ some_module_3.py # Defines some_func_3()
         └─ dir_2
             └─ some_module_4.py # Defines some_func_4()

A package also allows subpackages.

Packages allow for a hierarchical structuring of the module namespace using dot notation. In the same way that modules avoid collisions between global variable names, packages avoid collision between module names. For example, the urllib package contains several modules: urllib.request, urllib.error, etc.

To import the modules of the package represented above, ensure that the directory some_dir, the parent of some_package_1, is in the module search path and use the following import statements, where a module is identified using dot notation relative to its package name:

import some_package_1.some_module_1
import some_package_1.some_module_2
import some_package_1.dir_1.some_module_3
import some_package_1.dir_1.dir_2.some_module_4

some_package_1.some_module_1.some_func_1()
some_package_1.some_module_2.some_func_2()
some_package_1.dir_1.some_module_3.some_func_3()
some_package_1.dir_1.dir_2.some_module_4.some_func_4()

A slightly more compact version is:

from some_package_1 import some_module_1
from some_package_1 import some_module_2
from some_package_1.dir_1 import some_module_3
from some_package_1.dir_1.dir_2 import some_module_4

some_module_1.some_func_1()
some_module_2.some_func_2()
some_module_3.some_func_3()
some_module_4.some_func_4()

Importing the package itself is syntactically correct, but unless there is an __init__.py, the import does not do anything useful. In particular, it does not place any of the modules in the package in the local namespace:

import some_package_1
print(str(some_package_1))
<module 'some_package_1' (namespace)>

Package Name

See module name above.

Package Kinds

Regular Package

A traditional package, such as a directory containing an __init__.py file.

__init__.py

When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. The __init__.py file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.

Much of the Python documentation states that the __init__.py file must be present in the package directory, even if as an empty file, for the package to be valid. This was once true. Since Python 3.3, PEP 420 Implicit Namespace Packages were introduced and they allow for the creation of a package without any __init__.py file.

Assuming that __init__.py is declared in some_package_1 , as shown above, and has the following content:

# this is __init__.py
COLOR = 'blue'

then importing the package itself binds the COLOR in the package's namespace, making it accessible to the client program importing the package:

import some_package_1
assert 'blue' == some_package_1.COLOR

A module in the package can access the global variable by importing it in turn. In some_module_1.py:

from some_package_1 import COLOR

[...]

def print_color():
    print(f'color is {COLOR}')

__init__.py can also be used to automatically import the modules from the package, so the clients of the package won't have to import them individually, and the objects from the package's modules will bound to the package namespace.

# this is __init__.py
import some_package_1.some_module_1
import some_package_1.some_module_2
import some_package_1.dir_1.some_module_3
import some_package_1.dir_1.dir_2.some_module_4

For a client of the package:

import some_package_1
some_package_1.some_module_1.some_func_1()
some_package_1.some_module_2.some_func_2()
some_package_1.dir_1.some_module_3.some_func_3()
some_package_1.dir_1.dir_2.some_module_4.some_func_4()

__main__.py

https://docs.python.org/3/library/__main__.html

The file contains the code of the "main" module, which will be imported automatically when the package is imported.

Namespace Package

A PEP 420 package which serves only as a container for subpackages. Namespace packages may have no physical representation, and have no __init__.py file.

Importing * from a Package

TO PROCESS: https://realpython.com/python-modules-packages/#importing-from-a-package

Package Metadata

Name: pulumi
Version: 2.11.2
Summary: Pulumi's Python SDK
Home-page: https://github.com/pulumi/pulumi
Author:
Author-email:
License: Apache 2.0
Location: /Users/ovidiu/Library/Python/3.8/lib/python/site-packages
Requires: dill, grpcio, protobuf
Required-by: pulumi-aws, pulumi-kubernetes, pulumi-random, pulumi-tls

Requires

Required-by

Subpackages

TO PROCESS: https://realpython.com/python-modules-packages/#subpackages

Package Example

A package example is available here:

https://github.com/ovidiuf/playground/tree/master/pyhton/packages/some_package_1

A program that consumes it is available here:

https://github.com/ovidiuf/playground/tree/master/pyhton/packages/consumer-of-packages

Publishing a Python Package in a Repository

Organizatorium

  • Each installation of Python may have different modules installed. Python determines the path to its modules by examining the location of the python3 executable.
  • Technically, a package is a Python module with a __path__ attribute.
  • Packages can be run as if they were scripts if the package provides a top-level script __main__.py.
  • Understand setup.py - it defines the entry point 9/0.