Python Language String: Difference between revisions
(88 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
* [[Python_Language#String|Python Language]] | * [[Python_Language#String|Python Language]] | ||
* [[Printing_to_stdout_in_Python#Overview|Printing to stdout in Python]] | * [[Printing_to_stdout_in_Python#Overview|Printing to stdout in Python]] | ||
* [[Go Strings]] | |||
=TODO= | |||
<font color=darkkhaki> | |||
* [[PyOOP]] "Strings and Serialization" | |||
* [[PyOOP]] "Strings" | |||
* [[PyOOP]] "String manipulation" | |||
* [[PyOOP]] "String formatting" | |||
* [[PyOOP]] "Escaping braces" | |||
* [[PyOOP]] "f-strings can contain Python code" | |||
* [[PyOOP]] "Making it look right" | |||
* [[PyOOP]] "Custom formatters" | |||
* [[PyOOP]] "The format method" | |||
* [[PyOOP]] "Strings are Unicode" | |||
* [[PyOOP]] "Converting bytes to text" | |||
* [[PyOOP]] "Converting text to bytes" | |||
* [[PyOOP]] "Mutable byte strings" | |||
</font> | |||
=Overview= | =Overview= | ||
String are a Python [[Python_Language#Sequence_Types|sequence]] of characters. Strings are immutable, a character within a string [[#String_Immutability|cannot be changed in-place]] once the string object has been instantiated. Python 3 supports the Unicode standard, so Python 3 strings can contain characters from any written language in the world. | String are a Python [[Python_Language#Sequence_Types|sequence]] of characters. Strings are immutable, a character within a string [[#String_Immutability|cannot be changed in-place]] once the string object has been instantiated. Python 3 supports the Unicode standard, so Python 3 strings can contain characters from any written language in the world. | ||
Strings are [[Python_Language#Sequence_Types|sequences]] of Unicode characters and therefore can be treaded like other sequence like [[Python_Language_List#Overview|lists]] and [[Python_Language_Tuple#Overview|tuples]]: they can be [[#Slices|sliced]] or [[#Iterate_over_Characters_of_a_String|iterated upon]]. | |||
=Strings are Immutable= | |||
A string cannot be modified with the assignment operator: | |||
<syntaxhighlight lang='py'> | |||
s = "something" | |||
s[3] = "s" | |||
</syntaxhighlight> | |||
<font size=-2> | |||
--------------------------------------------------------------------------- | |||
TypeError Traceback (most recent call last) | |||
Cell In[10], line 1 | |||
----> 1 s[3] = "s" | |||
TypeError: 'str' object does not support item assignment | |||
</font> | |||
=Declaring Strings= | =Declaring Strings= | ||
Line 50: | Line 87: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==F-String== | ==<span id='F-String'></span>F-String (Literal String Interpolation)== | ||
{{External|[https://www.python.org/dev/peps/pep-0498/ PEP 498]}} | {{External|[https://www.python.org/dev/peps/pep-0498/ PEP 498]}} | ||
An f-string is a literal string, prefixed with "f", which contains expressions inside | An f-string, short for '''formatted string literal''' is a literal string, prefixed with "f", which contains expressions inside curly braces. The expressions are replaced with their values. Introduced by [https://www.python.org/dev/peps/pep-0498/ PEP 498]. Any kind of string (single-quote enclosed, double-quote enclosed and triple-quote enclosed) can be an f-string. | ||
<syntaxhighlight lang='python'> | <syntaxhighlight lang='python'> | ||
Line 64: | Line 101: | ||
""") | """) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Format specifiers can be added after each expression: | |||
<syntaxhighlight lang='python'> | |||
amount = 10.10 | |||
currency = "Euros" | |||
usd = 9 | |||
s = f"{amount:0.2f} {currency:s} are worth USD {usd:d}" | |||
print(s) # 10.10 Euros are worth USD 9 | |||
</syntaxhighlight> | |||
<font color=darkkhaki> | <font color=darkkhaki> | ||
Indent to the right over 6 spaces: | Indent to the right over 6 spaces: | ||
Line 98: | Line 142: | ||
if not s: | if not s: | ||
print("empty") | |||
</syntaxhighlight> | |||
Emptiness can also be tested with: | |||
<syntaxhighlight lang='py'> | |||
s = '' | |||
if len(s) != 0: | |||
print("NOT empty") | |||
else: | |||
print("empty") | print("empty") | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 114: | Line 166: | ||
... | ... | ||
</syntaxhighlight > | </syntaxhighlight > | ||
=Convert other Data Types to Strings with <tt>str()</tt>= | =Type Conversions from and to String= | ||
Other data types can be converted to string using the <code>str()</code> function. Python uses the <code>str()</code> function internally when <code>[[Printing_to_stdout_in_Python#Overview|print()]]</code> is called on objects that are not strings, and when doing [[#String_Interpolation|string interpolation]]. | ==Convert other Data Types to Strings with <tt>str()</tt>== | ||
Other data types can be converted to string using the <code>[[Python_Language_Functions#str|str()]]</code> function. Python uses the <code>str()</code> function internally when <code>[[Printing_to_stdout_in_Python#Overview|print()]]</code> is called on objects that are not strings, and when doing [[#String_Interpolation|string interpolation]]. | |||
==Check whether a String can be Converted to an <tt>int</tt>== | |||
Use <code>isdigit()</code> string methods. | |||
<syntaxhighlight lang='py'> | |||
s = "something" | |||
s2 = "10" | |||
assert s.isdigit() is not True | |||
assert s2.isdigit() is True | |||
</syntaxhighlight> | |||
==Convert a String to List== | |||
<syntaxhighlight lang='py'> | |||
s = "abc" | |||
l = list(s) | |||
print(l) # ['a', 'b', 'c'] | |||
</syntaxhighlight> | |||
Also see: {{Internal|Python_Language_Functions#list|<tt>list()</tt>}} | |||
=String Equality= | =String Equality= | ||
<font color=darkkhaki> | <font color=darkkhaki> | ||
Line 128: | Line 198: | ||
</font> | </font> | ||
=Reading | =<span id='Reading_String'></span>Reading Strings= | ||
{{External|https://www.pythonforbeginners.com/basics/string-manipulation-in-python}} | {{External|https://www.pythonforbeginners.com/basics/string-manipulation-in-python}} | ||
==The <tt>[]</tt> Operator and String Slices== | ==<span id='The_Length_of_a_String'></span>String Length== | ||
The length of the string is returned by the [[Python_Language_Functions#len.28.29|built-in]] <code>len()</code> function. | |||
<syntaxhighlight lang='py'> | |||
s = 'abc' | |||
assert len(s) == 3 | |||
</syntaxhighlight> | |||
==<span id='Slices'></span>The <tt>[]</tt> Operator and String Slices== | |||
The <code>[...:...]</code> syntax is called '''slicing''' and it implemented for many kinds of Python sequences. | |||
===Reading Individual Characters=== | ===Reading Individual Characters=== | ||
The <code>[]</code> operator takes an offset from the beginning of the string, going from left to right (positive) or from the first position '''after''' the end of the string, going from right to left (negative) and returns the character. | The <code>[]</code> operator takes an offset from the beginning of the string, going from left to right (positive) or from the first position '''after''' the end of the string, going from right to left (negative) and returns the character. | ||
Line 148: | Line 226: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
If the offset reaches beyond the string boundaries, applying the <code>[]</code> operator results in an <code>IndexError: string index out of range</code> exception. | If the offset reaches beyond the string boundaries, applying the <code>[]</code> operator results in an <code>IndexError: string index out of range</code> exception. | ||
====Get the First Character of a String==== | |||
<syntaxhighlight lang='python'> | |||
s = "something" | |||
assert s[0] == 's' | |||
</syntaxhighlight> | |||
====Get the Last Character of a String==== | |||
<syntaxhighlight lang='python'> | |||
s = "something" | |||
assert s[-1] == 'g' | |||
</syntaxhighlight> | |||
===Reading Substrings with the Slice Operator <tt>[start:end:step]</tt>=== | ===Reading Substrings with the Slice Operator <tt>[start:end:step]</tt>=== | ||
The slice operator <code>[start:end:step]</code> extracts a substring defined as follows: | |||
====<tt>start</tt>==== | |||
<code>start</code> specifies the offset of the first character, from the beginning of the string. If it's missing, 0 is implied. Unlike the <code>[]</code> operator, if the start offset falls beyond the edge of the string, no exception is thrown, but the empty string is returned instead: a value beyond of the end of the string is assumed to be the end of the string. | |||
<syntaxhighlight lang='py'> | |||
s = 'blue' | |||
assert s[:] == 'blue' | |||
assert s[0:] == 'blue' | |||
assert s[2:] == 'ue' | |||
assert s[4:] == '' | |||
assert s[5:] == '' | |||
</syntaxhighlight> | |||
Negative values can be used, which mean an offset from the end of the string, instead of the beginning. | |||
<syntaxhighlight lang='py'> | |||
s = 'blue' | |||
assert s[-1:] == 'e' | |||
</syntaxhighlight> | |||
====<tt>end</tt>==== | |||
<code>end</code> represents the offset of the first character that '''is not''' included in the substring. If it is not specified, the end is implied to be the first position outside of the string, so the whole end of the string is included. If the end offset falls outside the string, the whole end of the string is included. | |||
<syntaxhighlight lang='py'> | |||
s = 'blue' | |||
assert s[:0] == '' | |||
assert s[:1] == 'b' | |||
assert s[:2] == 'bl' | |||
assert s[:100] == 'blue' | |||
# missing 'end' means the whole end of the string | |||
assert s[2:] == 'ue' | |||
</syntaxhighlight> | |||
Negative values can be used, which mean an offset from the beginning of the string, instead of the end. | |||
<syntaxhighlight lang='py'> | |||
s = 'blue' | |||
assert s[:-2] == 'bl' | |||
</syntaxhighlight> | |||
====<tt>step</tt>==== | |||
<code>step</code> means select every <code>step</code> character, starting from the <code>start</code> offset. If not specified, the default is 1. | |||
<syntaxhighlight lang='py'> | |||
s = 'blue' | |||
assert s[::1] == 'blue' | |||
assert s[::2] == 'bu' | |||
assert s[::3] == 'be' | |||
</syntaxhighlight> | |||
A negative step value means it steps backward, starting from the rightmost character of the slice. This is how a string is rendered backwards: | |||
<syntaxhighlight lang='py'> | |||
s = 'blue' | |||
assert s[-1::-1] == 'eulb' | |||
</syntaxhighlight> | |||
====Get First n Characters as a String==== | |||
<syntaxhighlight lang='python'> | |||
s[0:n] | |||
s[:n] # equivalent | |||
</syntaxhighlight> | |||
==Iterate over Characters of a String== | |||
<syntaxhighlight lang='python'> | |||
s = "something" | |||
for c in s: | |||
print(c) | |||
</syntaxhighlight> | |||
<syntaxhighlight lang='python'> | |||
s = "something" | |||
for i in range(0, len(s)): | |||
print(s[i]) | |||
</syntaxhighlight> | |||
<syntaxhighlight lang='python'> | |||
s = "something" | |||
for i, c in enumerate(s): | |||
print(i, c) | |||
</syntaxhighlight> | |||
=<span id='String_Manipulation_and_Processing'></span>String Manipulation and Processing= | =<span id='String_Manipulation_and_Processing'></span>String Manipulation and Processing= | ||
==Combine Strings with the <tt>+</tt> Operator== | ==Combine Strings with the <tt>+</tt> Operator== | ||
Line 165: | Line 326: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==Duplicate | ==Repeat a String a number of Times: Duplicate with the <tt>*</tt> Operator== | ||
The <code>*</code> operator can be used to be duplicate a string: | The <code>*</code> operator can be used to be duplicate a string: | ||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
Line 171: | Line 332: | ||
print(s, 'said Santa') | print(s, 'said Santa') | ||
</syntaxhighlight> | </syntaxhighlight> | ||
It the multiplication factor is 0, the result is the empty string: | |||
<syntaxhighlight lang='py'> | |||
assert '' == 'something ' * 0 | |||
</syntaxhighlight> | |||
==Split a String== | |||
<syntaxhighlight lang='py'> | |||
<string-to-be-split>.split(<separator>) | |||
</syntaxhighlight> | |||
<syntaxhighlight lang='py'> | |||
s = 'A, B, C' | |||
l = s.split(', ') | |||
assert l[0] == 'A' | |||
assert l[1] == 'B' | |||
assert l[2] == 'C' | |||
</syntaxhighlight> | |||
If no separator is specified, <code>split()</code> uses any sequence of white space characters (newline, space and tabs). | |||
==Join a String== | |||
<syntaxhighlight lang='py'> | |||
<separator>.join(<list_of_strings>) | |||
</syntaxhighlight> | |||
<syntaxhighlight lang='py'> | |||
s = '@'.join(['test', 'example.com']) | |||
assert 'test@example.com' == s | |||
</syntaxhighlight> | |||
==Search and String Statistics Functions== | |||
====The <tt>in</tt> Operator==== | |||
<syntaxhighlight lang='py'> | |||
s = 'blue is a color' | |||
if 'is a' in s: | |||
print('found') | |||
</syntaxhighlight> | |||
====<tt>startswith()</tt>==== | |||
<syntaxhighlight lang='py'> | |||
assert 'something'.startswith('some') | |||
</syntaxhighlight> | |||
====<tt>endswith()</tt>==== | |||
<syntaxhighlight lang='py'> | |||
assert 'something'.endswith('thing') | |||
</syntaxhighlight> | |||
====<tt>find()</tt>==== | |||
Finds the offset of the first occurrence of the given argument, by searching from the beginning of the string: | |||
<syntaxhighlight lang='py'> | |||
assert 'something'.find('ome') == 1 | |||
</syntaxhighlight> | |||
If the string is not found, the method returns -1. | |||
====<tt>rfind()</tt>==== | |||
Finds the offset of the first occurrence of the given argument, starting from the back of the string: | |||
<syntaxhighlight lang='py'> | |||
assert 'elemental'.rfind('e') == 4 | |||
</syntaxhighlight> | |||
If the string is not found, the method returns -1. | |||
====<tt>count()</tt>==== | |||
Counts the string occurrence: | |||
<syntaxhighlight lang='py'> | |||
assert 'We hope you had a pleasant trip from Seattle to San Francisco, and we want to hear about your experience traveling with United'.count('you') == 2 | |||
</syntaxhighlight> | |||
====<tt>isallnum()</tt>==== | |||
=String Functions= | ==String Manipulation Functions== | ||
==="Trim" a String with <tt>strip()</tt>=== | |||
Remove the specified sequence from both ends of a string. Note that if the sequence occurs repeatedly, all occurrences will be stripped. | |||
<syntaxhighlight lang='py'> | |||
assert '....test..'.strip('.') == 'test' | |||
</syntaxhighlight> | |||
If no arguments are given, whitespace is assumed: | |||
<syntaxhighlight lang='py'> | |||
assert ' \n test \t\t '.strip() == 'test' | |||
</syntaxhighlight> | |||
The function can be used to remove extension form the name of a file: | |||
<syntaxhighlight lang='py'> | |||
assert 'SomeFile.py'.strip('.py') == 'SomeFile' | |||
</syntaxhighlight> | |||
===Capitalization <tt>capitalize()</tt>, <tt>title()</tt>, <tt>upper()</tt>, <tt>lower()</tt>, <tt>swapcase()</tt>=== | |||
<code>capitalize()</code> capitalizes the first word. | |||
<code>title()</code> capitalizes all words. | |||
<code>isupper()</code> method returns True if all the characters are in upper case, otherwise False. Numbers, symbols and spaces are not checked, only alphabet characters. | |||
<code>upper()</code> converts all characters to uppercase. | |||
<code>islower()</code> method returns True if all the characters are in lower case, otherwise False. Numbers, symbols and spaces are not checked, only alphabet characters. | |||
<code>lower()</code> converts all characters to lowercase. | |||
===Padding and Alignment with <tt>ljust()</tt>, <tt>rjust()</tt>, <tt>center()</tt>=== | |||
====Leading Padding==== | |||
Leading padding can be done with <code>rjust()</code>. By default, the padding is done with spaces, but a specific character can be provided. | |||
<syntaxhighlight lang='py'> | |||
s = 'abc' | |||
assert ' abc' == s.rjust(5) | |||
assert '__abc' == s.rjust(5, '_') | |||
</syntaxhighlight> | |||
To fill a string with zeroes to the left use <code>zfill()</code>: | |||
<syntaxhighlight lang='py'> | |||
assert '004' == '4'.zfill(3) | |||
</syntaxhighlight> | |||
If the argument is a number, the same effect can be achieved with: | |||
<syntaxhighlight lang='py'> | |||
n = 4 | |||
assert '004' == f'{n:03}' | |||
</syntaxhighlight> | |||
<font color=darkkhaki>Not really, review this: where the first character following <code>:</code> specifies the padding character, and the second is the total length of the string.</font> | |||
====Trailing Padding==== | |||
Trailing padding can be done with <code>ljust()</code>. By default, the padding is done with spaces, but a specific character can be provided. | |||
<syntaxhighlight lang='py'> | |||
s = 'abc' | |||
assert 'abc ' == s.ljust(5) | |||
assert 'abc++' == s.ljust(5, '+') | |||
</syntaxhighlight> | |||
Equivalent method: | |||
<syntaxhighlight lang='py'> | |||
s = 'abc' | |||
assert 'abc' == f'{s:2}' | |||
assert 'abc' == f'{s:3}' | |||
assert 'abc ' == f'{s:4}' | |||
</syntaxhighlight> | |||
The same pattern can be used to produce a whitespace string of a certain length: | |||
<syntaxhighlight lang='py'> | |||
s = ' ' | |||
assert ' ' == f'{s:3}' | |||
</syntaxhighlight> | |||
====Centering==== | |||
A string can also be centered, by default between spaces or a specified character. | |||
<syntaxhighlight lang='py'> | |||
s = 'abc' | |||
assert ' abc ' == s.center(5) | |||
assert '--abc--' == s.center(7, '-') | |||
</syntaxhighlight> | |||
==<tt>replace()</tt> Function== | |||
The <code>replace()</code> function performs string substitution, creating a new string whose fragments matching the first argument are replaced with the second argument. It has an optional count argument. If the count argument is omitted, all occurrences are replaced. | |||
<syntaxhighlight lang='py'> | |||
s = 'tattle tale' | |||
assert 'bittle tale' == s.replace('ta', 'bi', 1) | |||
assert 'bittle bile' == s.replace('ta', 'bi') | |||
</syntaxhighlight> | |||
To replace a regular expression, use: {{Internal|Python_Regular_Expressions#Replacing_Regular_Expression_Occurrences|Python Regular Expressions | Replacing Regular Expression Occurrences}} | |||
=Regular Expressions= | =Regular Expressions= | ||
{{Internal|Python Regular Expressions#Overview|Regular Expressions}} | {{Internal|Python Regular Expressions#Overview|Regular Expressions}} | ||
=String Interpolation= | |||
=Strings and Unicode= | |||
Unicode strings can be converted to the UTF-8 bytes representation using the <code>encode()</code> method, and back from UTF-8 bytes to unicode with <code>decode()</code>: | |||
<syntaxhighlight lang='py'> | |||
s = "español" | |||
s2 = s.encode("utf-8") | |||
print(s2) # prints b'espa\xc3\xb1ol' | |||
s3 = s2.decode('utf-8') | |||
print(s3) # prints español | |||
</syntaxhighlight> | |||
=String | =Code Examples= | ||
==Reading a Multi-Line String Line by Line== | |||
TODO: not complete yet, prints two extra blank lines: | |||
<syntaxhighlight lang='py'> | |||
mls = """ | |||
a | |||
b | |||
c | |||
""" | |||
for line in mls.split('\n'): | |||
print(line) | |||
</syntaxhighlight> | |||
[ | ==Reading a File Line by Line== | ||
<font color=darkkhaki>TO PROCESS: https://www.geeksforgeeks.org/read-a-file-line-by-line-in-python/</font> | |||
<syntaxhighlight lang='py'> | |||
</syntaxhighlight> | |||
==Indent a Multiline String== | |||
<syntaxhighlight lang='py'> | |||
s = "a\n b\n c\n" | |||
indent = 6 | |||
s2 = '\n'.join([''.rjust(indent) + i for i in s.split('\n')]) | |||
</syntaxhighlight> |
Latest revision as of 19:51, 16 May 2024
Internal
TODO
- PyOOP "Strings and Serialization"
- PyOOP "Strings"
- PyOOP "String manipulation"
- PyOOP "String formatting"
- PyOOP "Escaping braces"
- PyOOP "f-strings can contain Python code"
- PyOOP "Making it look right"
- PyOOP "Custom formatters"
- PyOOP "The format method"
- PyOOP "Strings are Unicode"
- PyOOP "Converting bytes to text"
- PyOOP "Converting text to bytes"
- PyOOP "Mutable byte strings"
Overview
String are a Python sequence of characters. Strings are immutable, a character within a string cannot be changed in-place once the string object has been instantiated. Python 3 supports the Unicode standard, so Python 3 strings can contain characters from any written language in the world.
Strings are sequences of Unicode characters and therefore can be treaded like other sequence like lists and tuples: they can be sliced or iterated upon.
Strings are Immutable
A string cannot be modified with the assignment operator:
s = "something"
s[3] = "s"
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[10], line 1 ----> 1 s[3] = "s" TypeError: 'str' object does not support item assignment
Declaring Strings
Quotes
String literals can be declared using four type of quotes: sigle quotes '...'
, double quotes "..."
, triple single quotes '''...'''
and triple double quotes """..."""
.
Single and Double Quotes
s1 = 'abc'
s2 = "xyz"
Declaring string literals bounded by single and double quotes is equivalent. There are two types of quotes to make it possible to create strings that include single and double quotes: a single-quoted string allows specifying double quotes inside and a double-quoted string allows specifying single quotes inside:
s1 = 'the color is "red"'
s2 = "the shape is 'square'"
Three Single and Three Double Quotes
Multi-line string literals can be declared using three single quotes or three double quotes. The leading and training space in such strings will also be preserved. The attempt to declare multi-line string literals bounded by single or double quotes results in SyntaxError
exceptions.
s1 = '''
the color
is
"red"
'''
s2 = """
the shape
is
'square'
"""
print(s1)
print(s2)
The result is:
the color
is
"red"
the shape
is
'square'
F-String (Literal String Interpolation)
An f-string, short for formatted string literal is a literal string, prefixed with "f", which contains expressions inside curly braces. The expressions are replaced with their values. Introduced by PEP 498. Any kind of string (single-quote enclosed, double-quote enclosed and triple-quote enclosed) can be an f-string.
name = "long"
print(f'my name is {name}')
print(f"my name is {name}")
print(f"""
my name is {name}
""")
Format specifiers can be added after each expression:
amount = 10.10
currency = "Euros"
usd = 9
s = f"{amount:0.2f} {currency:s} are worth USD {usd:d}"
print(s) # 10.10 Euros are worth USD 9
Indent to the right over 6 spaces:
i = 10
print(f"{i:>6}")
Escaped Characters
Python allows escaping the meaning of some characters by preceding the character with a backslash (\
). Commonly escaped characters:
- New line:
\n
, which allows creating a multi-line string from a one-line string. - Tab:
\t
- Backslash itself:
\\
- Single quote
\'
to introduce single quotes in single-quoted strings. - Double quote
\"
to introduce double quotes in double-quoted strings.
print()
resolves the escaped characters before sending them to stdout
.
Empty String
An empty string can be declared using all kinds of quotes described above:
s1 = ''
s2 = ""
s3 = ''''''
s4 = """"""
Empty strings are evaluated to False
in conditional expressions and can be tested with if
:
s = ''
if s:
print("NOT empty")
else:
print("empty")
if not s:
print("empty")
Emptiness can also be tested with:
s = ''
if len(s) != 0:
print("NOT empty")
else:
print("empty")
A string that contains blank space is not considered empty.
String type()
The function type()
applied to a list returns:
<class 'str'>
To check whether an instance is a string:
i = ...
if type(i) is str:
...
Type Conversions from and to String
Convert other Data Types to Strings with str()
Other data types can be converted to string using the str()
function. Python uses the str()
function internally when print()
is called on objects that are not strings, and when doing string interpolation.
Check whether a String can be Converted to an int
Use isdigit()
string methods.
s = "something"
s2 = "10"
assert s.isdigit() is not True
assert s2.isdigit() is True
Convert a String to List
s = "abc"
l = list(s)
print(l) # ['a', 'b', 'c']
Also see:
String Equality
TODO
s1 = "abc" s2 = "abc" s3 = "xyz" assert s1 == s2 assert s2 != s3
Reading Strings
String Length
The length of the string is returned by the built-in len()
function.
s = 'abc'
assert len(s) == 3
The [] Operator and String Slices
The [...:...]
syntax is called slicing and it implemented for many kinds of Python sequences.
Reading Individual Characters
The []
operator takes an offset from the beginning of the string, going from left to right (positive) or from the first position after the end of the string, going from right to left (negative) and returns the character.
s = 'abc'
assert s[0] == 'a'
assert s[1] == 'b'
assert s[-1] == 'c'
assert s[-2] == 'b'
The []
operator can be used to read strings from the sequence, but not modify the sequence. Because strings are immutable, an attempt to change a character at a specific position in string will throw an TypeError
exception:
s = 'abc'
s[0] = 'x'
[...]
TypeError: 'str' object does not support item assignment
If the offset reaches beyond the string boundaries, applying the []
operator results in an IndexError: string index out of range
exception.
Get the First Character of a String
s = "something"
assert s[0] == 's'
Get the Last Character of a String
s = "something"
assert s[-1] == 'g'
Reading Substrings with the Slice Operator [start:end:step]
The slice operator [start:end:step]
extracts a substring defined as follows:
start
start
specifies the offset of the first character, from the beginning of the string. If it's missing, 0 is implied. Unlike the []
operator, if the start offset falls beyond the edge of the string, no exception is thrown, but the empty string is returned instead: a value beyond of the end of the string is assumed to be the end of the string.
s = 'blue'
assert s[:] == 'blue'
assert s[0:] == 'blue'
assert s[2:] == 'ue'
assert s[4:] == ''
assert s[5:] == ''
Negative values can be used, which mean an offset from the end of the string, instead of the beginning.
s = 'blue'
assert s[-1:] == 'e'
end
end
represents the offset of the first character that is not included in the substring. If it is not specified, the end is implied to be the first position outside of the string, so the whole end of the string is included. If the end offset falls outside the string, the whole end of the string is included.
s = 'blue'
assert s[:0] == ''
assert s[:1] == 'b'
assert s[:2] == 'bl'
assert s[:100] == 'blue'
# missing 'end' means the whole end of the string
assert s[2:] == 'ue'
Negative values can be used, which mean an offset from the beginning of the string, instead of the end.
s = 'blue'
assert s[:-2] == 'bl'
step
step
means select every step
character, starting from the start
offset. If not specified, the default is 1.
s = 'blue'
assert s[::1] == 'blue'
assert s[::2] == 'bu'
assert s[::3] == 'be'
A negative step value means it steps backward, starting from the rightmost character of the slice. This is how a string is rendered backwards:
s = 'blue'
assert s[-1::-1] == 'eulb'
Get First n Characters as a String
s[0:n]
s[:n] # equivalent
Iterate over Characters of a String
s = "something"
for c in s:
print(c)
s = "something"
for i in range(0, len(s)):
print(s[i])
s = "something"
for i, c in enumerate(s):
print(i, c)
String Manipulation and Processing
Combine Strings with the + Operator
Strings can be concatenated with the + operator:
'a' + 'b'
To concatenate strings and numbers, use type conversion function str()
:
'a' + str(1)
String literals (not variables) can be combined by just typing them one after another:
s = 'this is ' 'a ' 'string'
print(s)
Repeat a String a number of Times: Duplicate with the * Operator
The *
operator can be used to be duplicate a string:
s = 'Ho ' * 3
print(s, 'said Santa')
It the multiplication factor is 0, the result is the empty string:
assert '' == 'something ' * 0
Split a String
<string-to-be-split>.split(<separator>)
s = 'A, B, C'
l = s.split(', ')
assert l[0] == 'A'
assert l[1] == 'B'
assert l[2] == 'C'
If no separator is specified, split()
uses any sequence of white space characters (newline, space and tabs).
Join a String
<separator>.join(<list_of_strings>)
s = '@'.join(['test', 'example.com'])
assert 'test@example.com' == s
Search and String Statistics Functions
The in Operator
s = 'blue is a color'
if 'is a' in s:
print('found')
startswith()
assert 'something'.startswith('some')
endswith()
assert 'something'.endswith('thing')
find()
Finds the offset of the first occurrence of the given argument, by searching from the beginning of the string:
assert 'something'.find('ome') == 1
If the string is not found, the method returns -1.
rfind()
Finds the offset of the first occurrence of the given argument, starting from the back of the string:
assert 'elemental'.rfind('e') == 4
If the string is not found, the method returns -1.
count()
Counts the string occurrence:
assert 'We hope you had a pleasant trip from Seattle to San Francisco, and we want to hear about your experience traveling with United'.count('you') == 2
isallnum()
String Manipulation Functions
"Trim" a String with strip()
Remove the specified sequence from both ends of a string. Note that if the sequence occurs repeatedly, all occurrences will be stripped.
assert '....test..'.strip('.') == 'test'
If no arguments are given, whitespace is assumed:
assert ' \n test \t\t '.strip() == 'test'
The function can be used to remove extension form the name of a file:
assert 'SomeFile.py'.strip('.py') == 'SomeFile'
Capitalization capitalize(), title(), upper(), lower(), swapcase()
capitalize()
capitalizes the first word.
title()
capitalizes all words.
isupper()
method returns True if all the characters are in upper case, otherwise False. Numbers, symbols and spaces are not checked, only alphabet characters.
upper()
converts all characters to uppercase.
islower()
method returns True if all the characters are in lower case, otherwise False. Numbers, symbols and spaces are not checked, only alphabet characters.
lower()
converts all characters to lowercase.
Padding and Alignment with ljust(), rjust(), center()
Leading Padding
Leading padding can be done with rjust()
. By default, the padding is done with spaces, but a specific character can be provided.
s = 'abc'
assert ' abc' == s.rjust(5)
assert '__abc' == s.rjust(5, '_')
To fill a string with zeroes to the left use zfill()
:
assert '004' == '4'.zfill(3)
If the argument is a number, the same effect can be achieved with:
n = 4
assert '004' == f'{n:03}'
Not really, review this: where the first character following :
specifies the padding character, and the second is the total length of the string.
Trailing Padding
Trailing padding can be done with ljust()
. By default, the padding is done with spaces, but a specific character can be provided.
s = 'abc'
assert 'abc ' == s.ljust(5)
assert 'abc++' == s.ljust(5, '+')
Equivalent method:
s = 'abc'
assert 'abc' == f'{s:2}'
assert 'abc' == f'{s:3}'
assert 'abc ' == f'{s:4}'
The same pattern can be used to produce a whitespace string of a certain length:
s = ' '
assert ' ' == f'{s:3}'
Centering
A string can also be centered, by default between spaces or a specified character.
s = 'abc'
assert ' abc ' == s.center(5)
assert '--abc--' == s.center(7, '-')
replace() Function
The replace()
function performs string substitution, creating a new string whose fragments matching the first argument are replaced with the second argument. It has an optional count argument. If the count argument is omitted, all occurrences are replaced.
s = 'tattle tale'
assert 'bittle tale' == s.replace('ta', 'bi', 1)
assert 'bittle bile' == s.replace('ta', 'bi')
To replace a regular expression, use:
Regular Expressions
String Interpolation
Strings and Unicode
Unicode strings can be converted to the UTF-8 bytes representation using the encode()
method, and back from UTF-8 bytes to unicode with decode()
:
s = "español"
s2 = s.encode("utf-8")
print(s2) # prints b'espa\xc3\xb1ol'
s3 = s2.decode('utf-8')
print(s3) # prints español
Code Examples
Reading a Multi-Line String Line by Line
TODO: not complete yet, prints two extra blank lines:
mls = """
a
b
c
"""
for line in mls.split('\n'):
print(line)
Reading a File Line by Line
TO PROCESS: https://www.geeksforgeeks.org/read-a-file-line-by-line-in-python/
Indent a Multiline String
s = "a\n b\n c\n"
indent = 6
s2 = '\n'.join([''.rjust(indent) + i for i in s.split('\n')])