Go Strings: Difference between revisions
Line 224: | Line 224: | ||
<syntaxhighlight lang='go'> | <syntaxhighlight lang='go'> | ||
fields := strings.Fields(line) // returns a []string | fields := strings.Fields(line) // returns a []string | ||
</ | </syntaxhighlight > | ||
===<tt>Replace()</tt>=== | ===<tt>Replace()</tt>=== |
Revision as of 00:07, 11 January 2024
External
Internal
Overview
The main "use case" for strings is to hold characters made for printing, things you can see, and read. In Go, strings are read-only slices of bytes that represent Unicode code points. The language, and the standard library treat strings as containers of Unicode characters, encoded in the UTF-8 character encoding scheme. UTF-8 is a variable-length encoding which uses one to four bytes per character. While other languages refer to the string's components as "characters", Go refers to the same components as "runes", instance of the rune
type. They are 32-bit integers that represent Unicode code points. It is OK to refer to them as "characters".
Strings are immutable.
A string variable that is not explicitly initialized is implicitly initialized with the empty string.
Declaration
The pre-declared type identifier for strings is string
.
var s string // string type declaration without initialization
s = "example 1" // initialization after declaration
var s2 string = "example 2" // variable initialization in declaration
var s3 = "example 3" // variable initialization with type inference
s4 := "example 4" // short variable declaration
Literals
A string literal is a string constant produced by concatenating characters. Go has two kind of string literals: interpreted string literals and raw string literals.
Interpreted String Literals
An interpreted string literal is represented in Go code as a sequence of characters enclosed in double quotes. Each character is a byte, a rune, an UTF-8 code point. Interpreted strings allow escaping (\n or \t).
s := "something\nsomething else"
println(s)
Raw String Literals
Raw string literals are sequences of characters enclosed in backquotes (backticks) `
. All characters between the pair of matching backticks is taken literally, back slashes have no special meaning and new lines can appear. Carriage return characters inside raw string literals are discarded.
s := `This
is an \n \t
example of
raw string literal`
println(s)
will produce:
This
is an \n \t
example of
raw string literal
A typical pattern to declare string constants in packages is:
const config = `
something:
somethingelse:
- a
- b
- c
`
Raw string literals are commonly used to declare SQL statements with embedded quotes:
sql := `CREATE TABLE IF NOT EXISTS TEST ("ID" int, "NAME" varchar(10))`
Empty String
emptyString1 := ""
emptyString2 := ``
Operators
Indexing Operator []
The indexing operator []
returns a byte
(uint8
). It does NOT return a rune
.
Strings are zero-based indexed. If the index is out of bounds, the runtime generates a run-time panic:
panic: runtime error: index out of range [6] with length 3
Concatenation Operator +
The concatenation operator +
joins two strings together, producing a new immutable string instance. An attempt to use the concatenation operation between a string
and an int
, for example, won't work, because the int
won't be automatically converted to string
the way Java does.
s := "abc"
s2 := "xyz"
println(s + s2)
Equality Operator ==
String equality is tested with the ==
operator:
s := "blue"
s2 := "blue"
if s == s2 {
println("strings are equal")
}
Reading Strings
String Length
The number of bytes used to store a string is obtained by invoking the built-in function len()
on the string.
Note that len()
does NOT necessarily return the number of characters (runes) in the string. If a Unicode character is represented on more than one byte, the len()
result will be different from the number of characters in the string.
s := "A"
println(len(s)) // will display 1
s = "→"
println(len(s)) // will display 3, "→" requires 3 bytes to be encoded in UTF-8
The number of characters in a string is returned by utf8.RuneCountInString()
function:
import "unicode/utf8"
// ...
s := "A"
println(utf8.RuneCountInString(s)) // will display 1
s = "→"
println(utf8.RuneCountInString(s)) // will display 1
The number of characters in a string can also be obtained by applying len()
to the following type conversion:
s := "A"
println(len([]rune(s))) // will display 1
s = "→"
println(len([]rune(s))) // will display 1
Reading Characters from a String
"Characters" and "runes" are equivalent in this context. The characters are represented internally as rune
instances. Note that the indexing operator applied directly to the string does not return characters (rune
) but uint8
Read Individual Characters
Convert the string to a rune array and use the indexing operator applied to the rune array:
s := "A→B"
rs := []rune(s)
fmt.Printf("character 0: %c\n", rs[0]) // will display "A"
fmt.Printf("character 1: %c\n", rs[1]) // will display "→"
fmt.Printf("character 2: %c\n", rs[2]) // will display "B"
Also see Indexing Operator [] above.
Iterate over Characters
Use the range
keyword to iterated over the string's characters:
s := "A→B"
for pos, c := range s {
fmt.Printf("position: %d, character: %c, type: %s\n", pos, c, reflect.TypeOf(c))
}
will display:
position: 0, character: A, type: int32
position: 1, character: →, type: int32
position: 4, character: B, type: int32
Introspecting Characters
The unicode
package provides a set of function to introspect characters for specific properties, such as whether they are a digit, a space, a letter, a punctuation character, whether they are lower case or user case, etc. For more details, see:
String Manipulation and Processing in Go
The strings Package
String Comparison with Compare()
Compare()
is a string lexicographical comparison function in the strings
package.
import "strings"
a := "ABC"
b := "XYZ"
println(strings.Compare(a, b)) // prints -1 for a < b
println(strings.Compare(a, b)) // prints 1 for a > b
println(strings.Compare(a, "ABC")) // prints 0 for a == b
Contains()
A function of the strings
package that returns true if substr
is inside s
:
import "strings"
strings.Contains(s, substr)
HasPrefix()
HasPrefix(s, prefix)
is a function in the strings
package that returns true is s
starts with the prefix
.
HasSuffix()
HasSuffix()
is a function in the strings
package.
Index()
Index(s, substr)
is a function in the strings
package that searches inside the string s
for the substring substr
and returns the index of the first occurrence of substr
if it exists, or -1 otherwise.
Count()
Count()
is a function in the strings
package.
Join()
Join()
is a function in the strings
package.
Split()
Split(s, sep string)
is a function in the strings
package. Split()
slices the input string s
into all substrings separated by the sep
separator and returns a slice of the substrings between those separators.
If two of the separators occur successively, the result will contain an empty string, corresponding to the position between those separators. If you want to handle consecutive white-spaces as one white-space area, Fields()
is a better option.
If s
does not contain sep
and sep
is not empty, Split()
returns a slice of length 1 whose only element is s
.
If sep
is empty, Split
splits after each UTF-8 sequence.
If both s
and sep
are empty, Split returns an empty slice.
The call is is equivalent to SplitN
with a count of -1. To split around the first instance of a separator, see Cut()
.
Fields()
Take a string and breaks it down into tokens (fields) separated by one or more consecutive white space characters. Return a string slice.
fields := strings.Fields(line) // returns a []string
Replace()
Replace(s, old, new, n)
is a function in the strings
package that replaces the first n
instances of the old
substring with the new
substring. The string s
is not modified, the function returns a new string instance.
ToLower()
ToLower(s)
is a function in the strings
package that changes the whole string to lower case. The original string s
is not modified, a new string instance is created and returned.
ToUpper()
ToUpper(s)
is a function in the strings
package that changes the whole string to upper case. The original string s
is not modified, a new string instance is created and returned.
TrimSpace()
TrimSpace(s)
is a function in the strings
package that returns a new string with all leading and trailing spaces removed.
Trim()
Trim()
returns a slice of the string s with all leading and trailing Unicode code points contained in cutset removed.
s := ...
s = strings.Trim(s, "\n")
String Conversions
Conversion with strconv Functions
Conversion of a byte to string
TO DISTRIBUTE
Reading with a string with a Reader
TO PROCESS:
strings.NewReader()
See Go_Package_strings#NewReader.28.29