Python Language String

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

String are a Python sequence of characters. Strings are immutable, a character within a string cannot be changed in-place once the string object has been instantiated. Python 3 supports the Unicode standard, so Python 3 strings can contain characters from any written language in the world.

Declaring Strings

Quotes

String literals can be declared using four type of quotes: sigle quotes '...', double quotes "...", triple single quotes '''...''' and triple double quotes """...""".

Single and Double Quotes

s1 = 'abc'
s2 = "xyz"

Declaring string literals bounded by single and double quotes is equivalent. There are two types of quotes to make it possible to create strings that include single and double quotes: a single-quoted string allows specifying double quotes inside and a double-quoted string allows specifying single quotes inside:

s1 = 'the color is "red"'
s2 = "the shape is 'square'"

Three Single and Three Double Quotes

Multi-line string literals can be declared using three single quotes or three double quotes. The leading and training space in such strings will also be preserved. The attempt to declare multi-line string literals bounded by single or double quotes results in SyntaxError exceptions.

s1 = '''
    the color
        is
    "red"
    '''
s2 = """
    the shape
        is
    'square'
    """
print(s1)
print(s2)

The result is:

    the color
        is
    "red"


    the shape
        is
    'square'

F-String

PEP 498

An f-string is a literal string, prefixed with "f", which contains expressions inside branches. The expressions are replaced with their values. Introduced by PEP 498. Any kind of string (single-quote enclosed, double-quote enclosed and triple-quote enclosed) can be an f-string.

name = "long"
print(f'my name is {name}')
print(f"my name is {name}")
print(f"""

   my name is {name}

""")

Indent to the right over 6 spaces:

i = 10
print(f"{i:>6}")

Escaped Characters

Python allows escaping the meaning of some characters by preceding the character with a backslash (\). Commonly escaped characters:

  • New line: \n, which allows creating a multi-line string from a one-line string.
  • Tab: \t
  • Backslash itself: \\
  • Single quote \' to introduce single quotes in single-quoted strings.
  • Double quote \" to introduce double quotes in double-quoted strings.

print() resolves the escaped characters before sending them to stdout.

Empty String

An empty string can be declared using all kinds of quotes described above:

s1 = ''
s2 = ""
s3 = ''''''
s4 = """"""

Empty strings are evaluated to False in conditional expressions and can be tested with if:

s = ''
if s:
  print("NOT empty")
else:
  print("empty")

if not s:
  print("empty")

A string that contains blank space is not considered empty.

String type()

The function type() applied to a list returns:

<class 'str'>

To check whether an instance is a string:

i = ...
if type(i) is str:
  ...

Convert other Data Types to Strings with str()

Other data types can be converted to string using the str() function. Python uses the str() function internally when print() is called on objects that are not strings, and when doing string interpolation.

String Equality

TODO

s1 = "abc"
s2 = "abc"
s3 = "xyz"
assert s1 == s2
assert s2 != s3

Reading String

https://www.pythonforbeginners.com/basics/string-manipulation-in-python

The [] Operator and String Slices

Reading Individual Characters

The [] operator takes an offset from the beginning of the string, going from left to right (positive) or from the first position after the end of the string, going from right to left (negative) and returns the character.

s = 'abc'
assert s[0] == 'a'
assert s[1] == 'b'
assert s[-1] == 'c'
assert s[-2] == 'b'

The [] operator can be used to read strings from the sequence, but not modify the sequence. Because strings are immutable, an attempt to change a character at a specific position in string will throw an TypeError exception:

s = 'abc'
s[0] = 'x'
[...]
TypeError: 'str' object does not support item assignment

If the offset reaches beyond the string boundaries, applying the [] operator results in an IndexError: string index out of range exception.

Get the First Character of a String

s = "something"
assert s[0] == 's'

Get the Last Character of a String

s = "something"
assert s[-1] == 'g'

Reading Substrings with the Slice Operator [start:end:step]

The slice operator [start:end:step] extracts a substring defined as follows:

start

start specifies the offset of the first character, from the beginning of the string. If it's missing, 0 is implied. Unlike the [] operator, if the start offset falls beyond the edge of the string, no exception is thrown, but the empty string is returned instead: a value beyond of the end of the string is assumed to be the end of the string.

s = 'blue'
assert s[:] == 'blue'
assert s[0:] == 'blue'
assert s[2:] == 'ue'
assert s[4:] == ''
assert s[5:] == ''

Negative values can be used, which mean an offset from the end of the string, instead of the beginning.

s = 'blue'
assert s[-1:] == 'e'

end

end represents the offset of the first character that is not included in the substring. If it is not specified, the end is implied to be the first position outside of the string, so the whole end of the string is included. If the end offset falls outside the string, the whole end of the string is included.

s = 'blue'
assert s[:0] == ''
assert s[:1] == 'b'
assert s[:2] == 'bl'
assert s[:100] == 'blue'
# missing 'end' means the whole end of the string
assert s[2:] == 'ue'

Negative values can be used, which mean an offset from the beginning of the string, instead of the end.

s = 'blue'
assert s[:-2] == 'bl'

step

step means select every step character, starting from the start offset. If not specified, the default is 1.

s = 'blue'
assert s[::1] == 'blue'
assert s[::2] == 'bu'
assert s[::3] == 'be'

A negative step value means it steps backward, starting from the rightmost character of the slice. This is how a string is rendered backwards:

s = 'blue'
assert s[-1::-1] == 'eulb'

String Manipulation and Processing

Combine Strings with the + Operator

Strings can be concatenated with the + operator:

'a' + 'b'

To concatenate strings and numbers, use type conversion function str():

'a' + str(1)

String literals (not variables) can be combined by just typing them one after another:

s = 'this is ' 'a ' 'string'
print(s)

Duplicate Strings with the * Operator

The * operator can be used to be duplicate a string:

s = 'Ho ' * 3
print(s, 'said Santa')

replace() Function

The replace() function creates a new string whose some characters replaced by other characters.

String Functions

Regular Expressions

Regular Expressions

String Interpolation

TO DEPLETE

Python Language String TODELETE