Go Packages: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 467: Line 467:
</font>
</font>
</font>
</font>
=The "pkg" Package=
<font color=darkkhaki>[[MiGo]] Chapter 2 Public packages</font>


=Vendoring=
=Vendoring=

Revision as of 23:33, 22 December 2023

External

Internal

Overview

Go modularization is built upon the concept of package.

A Go package is a collection of constants, variables, functions and type definitions, such as structs and interfaces that are connected to each other and provide related functionality. These constants, variables, etc. are referred to as members, or features of the package. All elements of a package exist in source files stored in the same directory, and that are compiled together as a unit. No Go source code may exist outside a package.

The intention behind bundling program elements together is to make the design and the maintenance of large programs practical. Packages are units that can be easier understood and changed, and that can also evolve independently of other packages. These characteristics allow packages to be shared, distributed and reused by different and otherwise independent projects. As such, a package should contain code that has a single purpose.

Packages provide a namespace for their members. They also provide an encapsulation mechanism for their code, by hiding implementation details and only exposing features, such as variables, types or functions that are meant to be publicly consumed. Packages are one of Go approaches to reusing code, according to the Don't Repeat Yourself principle. Keeping pre-compiled packages around speeds up the compilation process, because a package whose source code did not change does not need recompilation.

Packages can be used by themselves, and they also can be published as part of modules. Modules have been introduced in Go 1.11.

How to Organize Packages

Process this again: https://www.youtube.com/watch?v=MzTcsI6tn-0

If you are going to have multiple packages, it is probably a good idea to orient them around two categories:

  1. business domain types
  2. services

Use this organization rather than building around of accidents of implementation.

The domain types are types that model the business functionality and objects. Services are packages that operate on or with the domain types.

The packages that contain the domain types should also define the interfaces between your domain types and the rest of the world. These interfaces define the things you want to do with your domain types: Employee, ProductService, SupplierService, etc.

The domain type package, or the root package of your application should not have any external dependencies. They only exist for the purpose of defining your types and their dependencies.

The implementation of your domain interfaces should be in separate packages (sub-packages), organized by dependency. Dependencies are external data sources, transport logic (http, RPC), etc. You should have one package per dependency.

Import Path

Packages are consumed by importing them with the import keyword. For details on the syntax, see the Import Statement section, below.

When importing a package, one uses an import path, not the package name, which is explained in detail below:

import "<import-path>"

Example:

import "a/b/c"

The import path is a file system path-like string that uniquely identifies a package. The language specification does not define what the string means, it is left to the tool that performs the import and compiles the source code to interpret it.

In case the package in question is not part of a module, the import path is the relative path of the local file system directory that contains the package source files. The path is relative to a src directory whose parent is listed in GOPATH. All source files that make the package must be stored in the same directory, referred to as package directory.

. ← the directory that must be listed in GOPATH to make the package visible to its consumers
│ 
└─ src
   └─ a
      └─ b
         └─ c
            ├─ file1.go
            ├─ file2.go
            └─ file3.go

In the example above, file1.go, file2.go and file3.go are all declared as part of package x.

file1.go may look like:

package x

var SomeValue = 10

// ...

To use the features exported by package x, the consuming package must import the package x using its import path "a/b/c", and must use its features by prefixing them with the package name:

package consumer

import "a/b/c"

func F() {
  println(x.SomeValue)
}

As an example, the public variable N exported by package y is used by prefixing it with the package name. The resulting identifier x.SomeValue is referred to as a qualified identifier.

However, the package name to use for qualified identifiers is not obvious, at least if one looks at the import line. The import path does not have to have anything in common with the package name. This makes this very generic, yet legal case slightly confusing, from a code readability point of view. The reader may ask: where does "x" come from? The package name is known to the compiler, of course, but the programmer must look into a package source file to figure out the name. The readability can be improved with naming conventions and syntax, as shown below.

For more details on declaring packages, see Declaring Packages section, below.

If the consumed package is part of a module, the package import path starts with the module path, followed by path of the directory containing the package source files, relative to the root of the module. Module path is explained here. The root of the module is the directory containing the go.mod file.

.
├─ go.mod  # declares that this is module "example.com"
│
└─ a
   └─ b
      └─ c
         ├─ file1.go
         ├─ file2.go
         └─ file3.go 

The same import and usage syntax applies, the only difference being that the package import path includes the module path:

package consumer

import "example.com/a/b/c"

func F() {
  println(x.SomeValue)
}

As in the previous "package-only" example, the package name does not show anywhere in the import line. To make the code more readable, the following conversions are recommended:

Package Name

The package name is the identifier that follows the package keyword, in all source files of the package:

package x // the package name

...

As explained above, the package name is not required to show up in the name of any of the file system elements associated with the package, such as package directory or package source files, though code readability can be improved by using the same name for the package and its hosting directory.

The package name must comply with the Go name requirements. For more advice on idiomatic package naming, see Idiomatic Package Conventions below.

When imported into a consuming package, the package name becomes part of the qualified identifiers that are used to access the public features of the package.

The package name can be changed by the consuming code one a per-file basis by renaming the import.

Conventions to Improve Code Readability

Use the Same Name for Package and Package Directory

If possible, use the same name for the package directory and package name. Using different names is legal, but using the same name will improve readability by turning this:

package consumer

import "a/b/c"

func F() {
  println(x.SomeValue)
}

into this:

package consumer

import "a/b/x" // the reader gets a hint where "x" is coming from

func F() {
  println(x.SomeValue)
}

The second version is slightly reader-friendlier, giving at least a hint where "x" is coming from.

However, there are situations when the name of the package directory must be different from the package name. There are at least three cases when this situation is justified.

  1. Packages defining an executable command. The name of the package has to be main, regardless of the name of the directory containing the command source files. If you can also name the package directory "main", that would be ideal, but that is not always possible.
  2. Some files in the directory may have the suffix _test on their package name if the file name ends in _test.go. Such a directory will define two packages: the normal one and an external test package. The _test suffix signals to go test that it must build both packages, and specifies which files belong to each packages. Also see Go Testing.
  3. Some dependency management tool append version number suffixes to package import paths.

Declare the Package Name in Import Line

If using the same name for package and package directory is not an option, the readability can be improved by using a syntactic feature of the import statement which allows explicitly specifying the package name in the import statement:

package consumer

import x "a/b/c" // unequivocally clarifies where 'x' is coming from

func F() {
  println(x.SomeValue)
}

Idiomatic Package Conventions

Write small packages, with a focused API, aimed at performing a single task.

Package names are generally given lowercase, single-word names. They should be short, concise and evocative, but not cryptic. Avoid long package names. The contents of a package should be consistent with the name. If you start noticing that a package includes extra logic that has no relationship to the package name, consider exporting it as a separate one, or use a more descriptive package name. Generally, you will find that the easier it is to give a short and specific self-descriptive name to a package, the better your code composition is. Package names usually take the singular form. Standard library packages bytes, errors and strings use the plural to avoid naming conflicts with the corresponding pre-declared types or keywords. Package names with underscores, hyphens or mixed caps should be avoided.

The name of the package and the name of the directory hosting the package source files should match if possible, unless you are in one of these situations.

The package members intended to be exported must start with an uppercase character. Only expose the package elements that we explicitly want other package to use, and hide everything else. For more details on exporting package members, see the Exporting Package Members section below.

Since each reference to the member of an imported package uses the imported package name as qualifier, the package name will provide context for its content. As such, the burden of describing the package semantics is borne equally by the package name and the member name. Design the names to work together, like in yaml.Reader. yaml.YAMLReader would be redundant.

Document packages. For small packages, add a comment line above the package declaration describing its contents, as shown in the Declaring Packages section. This only works if the package has only one file. What to do if the package has more than one file? Where to place the documentation? For large packages, use a doc.go file.

Separate private code using an internal directory.

Also see:

Go Style

Typical Package Layout and Files

.
├── file1.go
├── file1_test.go
├── file2.go
├── file2_test.go
├── ...
├── main.go # The file containing the main() function.
├── doc.go # Package documentation. A separate file is not necessary for small packages.
├── README.md # A README file written in Markdown
├── LICENSE
└── CONTRIBUTING.md

Modularization Considerations

Packages as Namespaces

Each package defines a distinct namespace that hosts all its identifiers. Within the package namespace, all identifiers must be unique.

When a public feature of a package, such as a function or a type is used outside of the package, its name must prefixed with the name of the package, with the intention of making the package and feature name pair unique in the universe block. Such a name is referred to as qualified identifier. For example, the Println() function, exported by the fmt package, is invoked with the fmt.Println qualified identifier:

import "fmt"
...
fmt.Println("hello")

This mechanism allows us to chose short, succinct names for the members of a package, without creating conflicts with other parts of the program.

Packages as Encapsulation Mechanism

The package is the most important mechanism for encapsulation in Go programs. Packages control which names are visible outside the package, by a simple convention: all names declared in the package block or field names and method names that start with an uppercase letter are publicly visible. Publicly visible names are referred to as exported names. For more details on exporting package members, see the Exporting Package Members section below.

Encapsulation is useful because it allows the package maintainer to change the implementation of the hidden members without affecting the public interface of the package, thus allowing the package to evolve without breaking its dependents. Restricting variables' visibility constrains the package clients to access and update them only through exported functions that preserve internal invariants and enforce mutual exclusion in concurrent programs.

Dependencies

Each import declaration established a dependency from the importing package to the imported package. go build detects dependency cycles:

package main
	imports a
	imports c
	imports b
	imports a: import cycle not allowed

Declaring Packages

Packages are defined by writing one or more source files that declare association with the package. Every source file that is part of the implementation of a package must start with the package statement, which consists of the package keyword followed by the name of the package the source file belongs to.

// Package color is a package that provides functionality around colors
package color

...

All files sharing the same package name form the implementation of the package. The identifiers declared in any of the files of a package are accessible from all other files of the same package.

All source files of a package inhabit the same file system directory, referred to as package directory.

A directory may not contain source files belonging to different packages. In such a situation, the compiler raises an error:

found packages colors (colors.go) and shapes (shapes.go) in .../src/colors

It is legal to declare packages that live in different package directories but use the same package name. They can even be used together, as long as the consuming package is aware they're different packages and makes that syntactically obvious by using different aliases:

.
└─ src
    ├─ a
    │  └─ file1.go # declares a package named "x"
    ├─ b
    │  └─ file2.go # declares a different package, also named "x"
    └─ consumer
       └─ consumer.go

file1.go:

package x

var SomeValue = 10

file2.go:

package x

var SomeValue = 20

consumer.go:

package consumer

import (
  "a"
  x2 "b"
)

func F() {
  println(x.SomeValue)
  println(x2.SomeValue)
}

F() will print:

10
20

Exporting Package Members

A package identifier is made publicly visible outside of the package if:

  1. The name is declared in the package block or is a struct field or a method name.
  2. The name starts with uppercase letter.

Publicly visible names are referred to as exported names. Code that belongs to another package and that imports the package can access the exported names. Names that are not explicitly exported are not visible publicly. "Hidden" or "unexported" is also used in this situation. This is how encapsulation is implemented for packages. Note that identifiers, and not values, are exported or unexported. It is a good practice to only expose the package elements that we explicitly want other package to use, and hide everything else. This allows to change the hidden parts later, without affecting the package's consumers.

In the following example:

package color

var color string = "blue"

func Color() string { 
  return color
}

the color variable is private to the package, but it can be accessed with the public Color() function.

Just because an identifier is unexported, it does not mean other packages can't indirectly access it. A function can return the value of an unexported type and this value is accessible by any calling function, even if the calling function has been declared in a different package. This can be coupled with the short declaration operator :=, which infers the type of the variable. A variable of that type cannot be explicitly declared with var.

Embedded Type Export

Structs | Embedded Type Export

Building Packages

Building with go build

go build

Building with GoLand

Building with GoLand

Publishing Packages

When a package file changes, the entire package must be recompiled and all the packages that depend on it.

Publishing Locally

go install

Publishing Remotely

Consuming Packages

Import Statement

https://go.dev/ref/spec#Import_declarations

Packages are consumed with an import statement that consists in the import keyword, followed by one or more import paths. Consuming a package means accessing and using a package's exported identifiers. When a package is imported during compilation, the compiler searches the local filesystem according to the rules specified here:

Module-Aware or GOPATH Mode

Each package can be imported and used individually, so developers can import only the functionality they need. The import statement syntax follows:

import "fmt"
import (
	"bufio"
	"fmt"
	"os"
	"strings"
)

Imported packages may be grouped introducing blank lines. By conventions, the names in a group are sorted alphabetically,

Once imported, the package names are used as qualifiers in the qualified identifiers of package's exported names. The package names are inferred automatically by the compiler, but they can be locally changed by renaming imports.

Renaming Imports

A different qualifier can be declared, providing an alias for the package name. This is called renaming an import:

import myQualifier "a/b/c"
...
println(myQualifier.SomeValue)

This feature is useful when two packages have the same name when their import paths differ. As an example, math/rand and crypto/rand have the same package name rand. In this situation, one of the imports must be renamed. Renaming only affects the importing file.

Blank Imports

It is a compilation error to import a package but not refer to it. However, there are situation when we want to import a package merely for the initialization side effects. This can be achieved using the blank identifier at import:

import _ "image/png" // this registers the PNG decoding upon import

This is known as a blank import.

Executables and the "main" Package

A binary executable is created when the Go build tool encounters a main package among the packages provided as arguments to go build or go install. In both cases, the build logic invokes the linker that assembles a self-contained executable from the main package and its dependencies. The linker uses the package as a start point and walks the package dependency graph when building the binary executable.

The main package must declare a main() function, otherwise the compiler raises an error:

runtime.main_main·f: function main is undeclared in the main package

The executable will be named differently, and placed in different locations depending on command line arguments and the value of certain environment variables when the build logic is executed. For more details see:

Package Initialization

Package initialization begins by initializing package-level variables in the order in which they are declared, resolving the dependencies first. If the package has multiple files, they are initialized in the order in which they are provided to the compiler. The go tool sorts them by name before invoking the compiler.

The initialization continues with the invocation of the init() functions, if they exist. Any file may contain any number of functions with the following signature:

func init() {
  ...
}

Such init() function cannot be called or referenced, but they are automatically executed in the order in which they are declared, when the package they're part of is imported. One package is initialized at a time, the dependencies are fully initialized before the dependents are initialized, so the init() functions in dependencies are executed before the init() functions declared in dependents. The main is the last to be initialized.

Single-Type Packages

Packages that expose one principal data type, plus its methods and often a New() function to create instances are referred to as single-type packages. The enclosed type is always qualified with the package name when used, so the package names should be short.

Internal (Private) Packages

The module's code that should not be imported by the module's dependents should be placed in a directory named internal, or in one of its recursive subdirectories. The packages declared under a directory named internal are prevented by the compiler to be imported by packages that live outside the source subtree in which the internal packages resides. internal is official, this pattern is enforced by the Go compiler. When the compiler sees an import of a package with internal in its path, it verifies that the package doing this import is within the directory tree rooted at the parent of the internal directory. Packages declared in internal siblings directories can import.

In the following example, the package a can import from package b, because a has the internal parent as its ancestor:

.
├── go.mod
└── level0
    ├── internal
    │   └── b
    │       └── b.go
    └── level1-1
        └── pkg
            └── a
                └── a.go

Normally, the projects use a less convoluted directory structure, where internal and pkg are siblings in the module root directory.

This is what happens in GoLang if a package tries to import from an internal directory:

Go Internal Package.png


A typical project layout employs this pattern if it makes sense. See:

Go Project internal directory

TODO:

Hierarchical Packages

Hierarchical packages or nested packages.

When necessary, use a descriptive parent package and several children packages implementing functionality, like the standard library encoding package:

./encoding/
├── ascii85                // package ascii85
│   ├── ascii85.go
│   └── ascii85_test.go
├── asn1
│   ├── asn1.go            // package asn1
│   ├── asn1_test.go
│   ├── common.go
│   └─ ...
├── base32
│   └── ...
├── ...
├── xml
│    └── ...
│
└── encoding.go           // package encoding

Vendoring

"Vendoring" is the act of making a local copy of a third party dependency package a project depends on. This copy is traditionally placed inside the project source tree and then saved in the project repository.

The associated directory structure is described here:

Workspace

The code below a directory named "vendor" is importable only by code in the directory tree rooted at the parent of "vendor", and only using an import path that omits the prefix up to and including the vendor element. Code in vendor directories deeper in the source tree shadows code in higher directories. Within the subtree rooted at foo, an import of crash/bang resolves to foo/vendor/crash/bang, not the top-level crash/bang. Code in vendor directories is not subject to import path checking. When go get checks out or updates a Git repository, it now also updates submodules. Vendor directories do not affect the placement of new repositories being checked out for the first time by go get. Those are always placed in the main GOPATH, never in a vendor subtree.

Locating Third-Party Packages

Use:

https://pkg.go.dev/

to find packages with functionality you might find useful.

Package Global State

We should strive to maintain little to no package global state. See 'Explicit Component Dependencies' why. The only function of init() is to initialize package global state. using it is a serious red flag in almost any program.

init()

TO PROCESS: https://david-yappeter.medium.com/init-in-go-programming-31e2c2bc2371#:~:text=A%20way%20to%20do%20this,the%20initialization%20of%20global%20variables

TODO