Pointers in Go: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(49 intermediate revisions by the same user not shown)
Line 1: Line 1:
=External=
=External=
* https://go.dev/ref/spec#Pointer_types
* https://go.dev/ref/spec#Pointer_types
* https://go.dev/doc/faq#Pointers
=Internal=
=Internal=
* [[Go_Language#Pointers|Go Language]]
* [[Go_Language#Pointers|Go Language]]
* [[Variables,_Parameters,_Arguments#Pointer|Pointers]]
* [[Variables,_Parameters,_Arguments#Pointer|Pointers]]
=TODO=
<font color=darkkhaki>
Further reading:
* https://medium.com/@meeusdylan/when-to-use-pointers-in-go-44c15fe04eac
* https://www.ardanlabs.com/blog/2013/07/understanding-pointers-and-memory.html
* https://go.dev/doc/faq#Pointers
</font>


=Overview=
=Overview=
Line 32: Line 26:
println(a) // will display 20
println(a) // will display 20
</syntaxhighlight>
</syntaxhighlight>
=<span id='Pointer_Variable_Name'></span>Pointer Variable Naming=


Once a non-<code>nil</code> value is assigned to a pointer, the Go runtime guarantees that the thing being posted to will continue to be valid for the life time of the pointer. This allows for a pattern when what looks like a stack variable can be allocated inside a function, and a pointer to it returned outside the function. The pointer will remain valid even if the stack is unwound,  the compiler will arrange for the memory location holding the value of i to be valid after the function return. This is done with escape analysis, which is the process of determining whether a variable should be stored on stack or on the heap:
Review of existing code has shown that people do not use special variable names to indicate that the variable contains a pointer. <code>someName</code> seems to be perfectly fine, and <code>someNamePtr</code> does not seem to be required. This is in part because the compiler knows how to handle transparently the difference between the values and pointers in some common cases. For example, a struct field is referred with the [[Go_Language#Selector_Operator|selector operator]] <code>.<field_name></code> [[Go_Structs#Selector_Operator_Versatility|regardless of whether the variable is a pointer to the structure or contains the struct value]].
 
Also see: {{Internal|Go_Variables#Naming|Variable Naming}}
 
=Pointer Type=
 
A '''pointer type''' denotes the set of all pointers to variables of a given type, called the '''base type''' of the pointer. Note that the base type and the associated pointer type are obviously two different types. Values of one cannot be assigned to another, and vice-versa. This is what happens when such an assignment is attempted:
<font size=-1.5>
./main.go:24: cannot use v2 (type *B) as type B in assignment
</font>
A pointer type is declared using the [[#The_Dereferencing_Operator_*|dereferencing operator]] <code>*</code> placed in front of the target type, which is the type of the stored value:
<syntaxhighlight lang='go'>
*int
</syntaxhighlight>
The difference between a base type and its associated pointer type is also relevant when we are discussing whether the type and its pointer type implement an interface. For a discussion on this subject, see:
{{Internal|Go_Interfaces#When_does_a_Type.2FPointer_Type_Implement_an_Interface.3F|When does a Type/Pointer Type Implement an Interface?}}
We cannot do pointer arithmetic. Assuming <code>ptr</code> is a <code>*int</code>, we cannot do <code>ptr + 1</code>:
<font size=-1.5>
invalid operation: ptr + 1 (mismatched types *int and int)
</font>and we can't do <code>ptr + ptr2</code>:
<font size=-1.5>
invalid operation: ptr + ptr2 (operator + not defined on pointer)
</font>
 
=Escape Analysis=
Once a non-<code>nil</code> value is assigned to a pointer, the Go runtime guarantees that the thing being pointed to will continue to be valid for the life time of the pointer. This allows for a pattern when what looks like a stack variable can be allocated inside a function, and a pointer to it returned outside the function. The pointer will remain valid even if the stack is unwound,  the compiler will arrange for the memory location holding the value of i to be valid after the function return. This is done with escape analysis, which is the process of determining whether a variable should be stored on stack or on the heap:
<syntaxhighlight lang='go'>
<syntaxhighlight lang='go'>
func makeInt() *int {
func makeInt() *int {
Line 45: Line 65:
  cmd/acmd.go:4:2: moved to heap: i  
  cmd/acmd.go:4:2: moved to heap: i  
</font size=-2>
</font size=-2>
The pointer data type comes with two operators: <code>&</code> (the [[#The_Referencing_Operator_.26|referencing operator]]), and <code>*</code> (the [[#The_Dereferencing_Operator_.2A|dereferencing operator]]).


=How to Tell if a Variable is a Pointer=
=How to Tell if a Variable is a Pointer=
Line 53: Line 71:
var b *int
var b *int
fmt.Println(reflect.TypeOf(b)) // will print "*int"
fmt.Println(reflect.TypeOf(b)) // will print "*int"
</syntaxhighlight>
Alternatively, use:
<syntaxhighlight lang='go'>
fmt.Printf("%#v\n", b) // will print "(*int)(nil)"
</syntaxhighlight>
</syntaxhighlight>


Line 88: Line 110:
</font>
</font>


=Pointer Variable Name=
=Pointer Operators=
<font color=darkkhaki>Do we use <code>someNamePtr</code> or <code>someName</code>?</font>
The pointer data type comes with two operators: <code>&</code> (the [[#The_Referencing_Operator_.26|referencing operator]]), and <code>*</code> (the [[#The_Dereferencing_Operator_.2A|dereferencing operator]]).
Also see: {{Internal|Go_Language#Variable_Names|Go Language &#124; Variable Names}}


=The Referencing Operator <tt>&</tt>=
==The Referencing Operator <tt>&</tt>==
The referencing operator (the ampersand operator) returns an address, also known as a "reference", from a variable. <code>&</code> should be read as "address of ...". It works with variables and also with literals. The syntax <code>&user{name:"Bill"}</code> where <code>user</code> is a <code>struct</code> is legal. The address is represented internally as an instance of type <code>pointer</code>. The address points to the location in memory where the instance associated with the "referenced" variable is stored.
The '''referencing operator''', also known as the ampersand operator, returns an address, also known as a "reference", from a variable. <code>&</code> should be read as "address of ...". The address is represented internally as an instance of type <code>pointer</code>. The address points to the location in memory where the instance associated with the "referenced" variable is stored.
<syntaxhighlight lang='go'>
<syntaxhighlight lang='go'>
&<variable-name>
&<variable_name>
</syntaxhighlight>
</syntaxhighlight>


Line 104: Line 125:
</syntaxhighlight>
</syntaxhighlight>


=The Dereferencing Operator <tt>*</tt>=
The referencing operator works with variables and also with struct literals. The syntax <code>&user{name:"Bill"}</code> where <code>user</code> is a <code>struct</code> is legal.
The dereferencing operator ([[#Star_Operator|star operator]]) takes a pointer and returns the value in memory the pointer's address points toward. The variable must contain a pointer type instance, otherwise the code will not compile. The value thus exposed can be read or written.
 
However, it does not work with other literals, such as string or int. The following statement produces a compilation error:
<syntaxhighlight lang='go'>
<syntaxhighlight lang='go'>
*<pointer-name>
s := &"somehting" // compilation error
</syntaxhighlight>
To "inline" such a declaration, an anonymous function can be used:
<syntaxhighlight lang='go'>
s := func() *string { s := "something"; return &s }()
</syntaxhighlight>
 
<font color=darkkhaki>
TODO: understand why & works in case of a struct literal and it does not work for a string literal. Aren't both literals?</font>
 
==The Dereferencing Operator <tt>*</tt>==
The dereferencing operator, also known as the [[#Star_Operator|star operator]], takes a pointer and returns the value in memory the pointer's address points to. The variable must contain a pointer type instance, otherwise the code will not compile. The value thus exposed can be read or written.
<syntaxhighlight lang='go'>
*<pointer_variable_name>
</syntaxhighlight>
</syntaxhighlight>


Line 117: Line 152:
println(color) // prints "red"
println(color) // prints "red"
</syntaxhighlight>
</syntaxhighlight>
=When to Use Values and When to Use Pointers=
=When to Use Values and When to Use Pointers=
If it makes sense for your use case, prefer using values and design your types so zero-values make logical sense and can be used by default.
However, there are some situations when pointers make sense.
'''Performance is not a good argument, most of the times'''. Passing pointers instead of values is generally slower, so performance is generally not an argument to use pointers. This is a consequence of Go being a garbage collected language. When a pointer is passed to a function, the runtime needs to perform [[#Escape_Analysis|escape analysis]] to figure out whether the variable should be store on stack or heap. If a lot of data is stored on heap, GC times increase. If the data is stored on the stack, no GC is needed, just push/pop operations. With less data stored on the heap, GC will have less work to do. The overhead of GC becomes less important when large amounts of data, like large structs, are copied around by pass-by-value.
'''Mutability'''. If an external struct needs to be mutated from inside a function, this may be a good argument for using a pointer. The default is to use pass-by-value, the entire structure will be copied on the stack and the function will modify the copy. However, mutability can be problematic in concurrent situations. A function free of side-effects is safer to use. The classical example of a function that does not mutate its argument but returns a new, modified value is <code>append()</code>:
<syntaxhighlight lang='go'>
a := []int{1}
a = append(a, 2)
</syntaxhighlight>
'''Pointer Receivers'''. It is a good idea to use a [[Go_Language_Object_Oriented_Programming#Pointer_Receiver_Type|pointer receiver]] everywhere, if you need at least one. The compiler will raise a static analysis warning if value and pointer receivers are mixed. See: {{Internal|Go_Language_Object_Oriented_Programming#Mixing_Value_and_Pointer_Receiver_Types|Mixing Value and Pointer Receiver Types}}
'''To model true absence'''. If values are passed around, true absence of a value cannot be really modeled, as a missing value will always be supplanted by the zero-value for the type. It is impossible to tell whether zero-value means legitimate zero or absence. In this case, a <code>nil</code> pointer can represent true absence. The alternative to using a pointer is to use an additional boolean that provides a "present" semantics.
=Pointers Lead to Values, the Reciprocal is Not Always True=
This fact is important for [[Go_Methods#Method_Set|method sets]]. The method set of a pointer to a type includes the method set of the type.
Also see: {{Internal|Go_Method_Set_for_Type_and_Method_Set_for_Pointer_to_Type#Overview| Method Set for Type and Method Set for Pointer to Type}}
=Pointers and Interfaces=
<font color=darkkhaki>
TODO:
* [[Go_Interfaces#Interface_Values|Go Interfaces &#124; Interface Values]]
* [[Go_Interfaces#Interfaces_as_Function_Parameters_and_Result_Values|Go Interfaces &#124; Interfaces as Function Parameters and Result Values]]
</font>

Latest revision as of 22:28, 1 September 2024

External

Internal

Overview

A pointer is a data type that represents a virtual address in memory, usually the address of a location in memory that is referred by a variable.

A pointer can be declared as such:

var aPtr *int // a pointer to an int

A pointer can also be implicitly declared using the short variable declaration and the the referencing operator inside functions:

a := 10
aPtr := &a

aPtr is a variable that contains the memory address of the memory location associated with the variable a. Changing the memory value using a syntax that involves the pointer will surface in the value of the variable:

*aPtr = 20
println(a) // will display 20

Pointer Variable Naming

Review of existing code has shown that people do not use special variable names to indicate that the variable contains a pointer. someName seems to be perfectly fine, and someNamePtr does not seem to be required. This is in part because the compiler knows how to handle transparently the difference between the values and pointers in some common cases. For example, a struct field is referred with the selector operator .<field_name> regardless of whether the variable is a pointer to the structure or contains the struct value.

Also see:

Variable Naming

Pointer Type

A pointer type denotes the set of all pointers to variables of a given type, called the base type of the pointer. Note that the base type and the associated pointer type are obviously two different types. Values of one cannot be assigned to another, and vice-versa. This is what happens when such an assignment is attempted:

./main.go:24: cannot use v2 (type *B) as type B in assignment

A pointer type is declared using the dereferencing operator * placed in front of the target type, which is the type of the stored value:

*int

The difference between a base type and its associated pointer type is also relevant when we are discussing whether the type and its pointer type implement an interface. For a discussion on this subject, see:

When does a Type/Pointer Type Implement an Interface?

We cannot do pointer arithmetic. Assuming ptr is a *int, we cannot do ptr + 1:

invalid operation: ptr + 1 (mismatched types *int and int)

and we can't do ptr + ptr2:

invalid operation: ptr + ptr2 (operator + not defined on pointer)

Escape Analysis

Once a non-nil value is assigned to a pointer, the Go runtime guarantees that the thing being pointed to will continue to be valid for the life time of the pointer. This allows for a pattern when what looks like a stack variable can be allocated inside a function, and a pointer to it returned outside the function. The pointer will remain valid even if the stack is unwound, the compiler will arrange for the memory location holding the value of i to be valid after the function return. This is done with escape analysis, which is the process of determining whether a variable should be stored on stack or on the heap:

func makeInt() *int {
  i := 10
  return &i
}

go build -gcflags="-m" cmd/acmd.go
[...]
cmd/acmd.go:4:2: moved to heap: i 

How to Tell if a Variable is a Pointer

Use reflect.TypeOf() on the variable. If the variable is a pointer, displaying the result of reflect.TypeOf() will start with "*":

var b *int
fmt.Println(reflect.TypeOf(b)) // will print "*int"

Alternatively, use:

fmt.Printf("%#v\n", b) // will print "(*int)(nil)"

Displaying Pointers

To display the value at memory address stored in the pointer, must dereference:

fmt.Printf("%d\n", *aPtr)

To display the memory address stored in the pointer in a hexadecimal notation, with the "0x" prefix, use %p or %v, they are equivalent for pointers:

fmt.Printf("%p\n", aPtr)
fmt.Printf("%v\n", aPtr) // same thing

This will print:

0xc000012080

For more details on the pointer, including the type of the data it points to, use:

fmt.Printf("%#v\n", aPtr)

This will print:

(*int)(0xc000012080)

Pointers can be also represented using the "%X" format specifier, which displays the pointer in base 16, upper case characters, without the "0x" prefix:

fmt.Printf("%X\n", aPtr)

This will print:

C000094018

Pointer Operators

The pointer data type comes with two operators: & (the referencing operator), and * (the dereferencing operator).

The Referencing Operator &

The referencing operator, also known as the ampersand operator, returns an address, also known as a "reference", from a variable. & should be read as "address of ...". The address is represented internally as an instance of type pointer. The address points to the location in memory where the instance associated with the "referenced" variable is stored.

&<variable_name>
color := "blue"
pointerToColor := &color
println(pointerToColor) // prints "0xc000058720"

The referencing operator works with variables and also with struct literals. The syntax &user{name:"Bill"} where user is a struct is legal.

However, it does not work with other literals, such as string or int. The following statement produces a compilation error:

s := &"somehting" // compilation error

To "inline" such a declaration, an anonymous function can be used:

s := func() *string { s := "something"; return &s }()

TODO: understand why & works in case of a struct literal and it does not work for a string literal. Aren't both literals?

The Dereferencing Operator *

The dereferencing operator, also known as the star operator, takes a pointer and returns the value in memory the pointer's address points to. The variable must contain a pointer type instance, otherwise the code will not compile. The value thus exposed can be read or written.

*<pointer_variable_name>
color := "blue"
pointerToColor := &color
println(*pointerToColor) // prints "blue"
*pointerToColor = "red"
println(color) // prints "red"

When to Use Values and When to Use Pointers

If it makes sense for your use case, prefer using values and design your types so zero-values make logical sense and can be used by default.

However, there are some situations when pointers make sense.

Performance is not a good argument, most of the times. Passing pointers instead of values is generally slower, so performance is generally not an argument to use pointers. This is a consequence of Go being a garbage collected language. When a pointer is passed to a function, the runtime needs to perform escape analysis to figure out whether the variable should be store on stack or heap. If a lot of data is stored on heap, GC times increase. If the data is stored on the stack, no GC is needed, just push/pop operations. With less data stored on the heap, GC will have less work to do. The overhead of GC becomes less important when large amounts of data, like large structs, are copied around by pass-by-value.

Mutability. If an external struct needs to be mutated from inside a function, this may be a good argument for using a pointer. The default is to use pass-by-value, the entire structure will be copied on the stack and the function will modify the copy. However, mutability can be problematic in concurrent situations. A function free of side-effects is safer to use. The classical example of a function that does not mutate its argument but returns a new, modified value is append():

a := []int{1}
a = append(a, 2)

Pointer Receivers. It is a good idea to use a pointer receiver everywhere, if you need at least one. The compiler will raise a static analysis warning if value and pointer receivers are mixed. See:

Mixing Value and Pointer Receiver Types

To model true absence. If values are passed around, true absence of a value cannot be really modeled, as a missing value will always be supplanted by the zero-value for the type. It is impossible to tell whether zero-value means legitimate zero or absence. In this case, a nil pointer can represent true absence. The alternative to using a pointer is to use an additional boolean that provides a "present" semantics.

Pointers Lead to Values, the Reciprocal is Not Always True

This fact is important for method sets. The method set of a pointer to a type includes the method set of the type.

Also see:

Method Set for Type and Method Set for Pointer to Type

Pointers and Interfaces

TODO: