NumPy ndarray: Difference between revisions
(35 intermediate revisions by the same user not shown) | |||
Line 12: | Line 12: | ||
The number of dimensions is reported by the <code>ndim</code> attribute. | The number of dimensions is reported by the <code>ndim</code> attribute. | ||
<span id='shape'></span>Each array also has a <code>shape</code> tuple that indicates the sizes of each dimensions. The length of the <code>shape</code> tuple is equal with <code>ndim</code> | <span id='shape'></span>'''Shape'''. Each array also has a <code>shape</code> tuple that indicates the sizes of each dimensions, along the [[#Axes|axes]] 0, 1, 2, etc. The length of the <code>shape</code> tuple is equal with <code>ndim</code> | ||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
import numpy as np | import numpy as np | ||
Line 27: | Line 24: | ||
(3, 4) | (3, 4) | ||
</syntaxhighlight> | </syntaxhighlight> | ||
<span id='Axes'></span><span id='Axis'></span>'''Axes'''. It is useful to think of axis 0 as "rows" and axis 1 as "columns": | |||
<font size=-2> | <font size=-2> | ||
Line 51: | Line 50: | ||
</font> | </font> | ||
<font | In multi-dimensional arrays, if you omit later indices, the returned object will be a lower dimensional array consisting of all the data along the other dimensions, which is stored in a contiguous area in memory. | ||
<syntaxhighlight lang='py'> | |||
a = np.array([[['A', 'B'], ['C', 'D']], [['E', 'F'], ['G', 'H']]]) # a 2 x 2 x 2 array | |||
a[0] # the first contiguous 2 x 2 array: | |||
</syntaxhighlight> | |||
<font size=-2> | |||
array([['A', 'B'], | |||
['C', 'D']], dtype='<U1') | |||
</font> | |||
=<tt>ndarray</tt> Creation= | =<tt>ndarray</tt> Creation= | ||
Line 233: | Line 241: | ||
=Array Indexing and Slicing= | =Array Indexing and Slicing= | ||
By '''indexing''' we mean selecting a subset of an array using the index operator <code>[...]</code> and a single numeric index. By '''slicing''' we mean selecting a subset of an array using the index operator <code>[...]</code> and slice expression <code>A:B</code>. Indices and slice expressions can be combined within the same index operator. | |||
==Unidimensional Array Slices== | ==Unidimensional Array Slices== | ||
Line 269: | Line 277: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==Multi-dimensional Array | ==Multi-dimensional Array Indexing and Slicing== | ||
For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. Each selected element of the slice is a smaller-dimension component. | For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. The slice selects a range element along an [[#Axis|axis]]. A colon by itself (<code>:</code>) means an entire axis. Each selected element of the slice is a smaller-dimension component. | ||
In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices: | In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices: | ||
Line 282: | Line 290: | ||
[7, 8, 9]]) | [7, 8, 9]]) | ||
</font> | </font> | ||
It is useful to think about the following slice expression as a slice of rows ("select the first two rows of the array"): | |||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
a[ | a[:2] | ||
</syntaxhighlight> | </syntaxhighlight> | ||
<font size=-2> | <font size=-2> | ||
┌───┬───┬───┐ | |||
0 │ 1 │ 2 │ 3 │ | |||
├───┼───┼───┤ | |||
1 │ 4 │ 5 │ 6 │ | |||
├───┼───┼───┤ | |||
│ │ │ │ | |||
└───┴───┴───┘ | |||
array([[1, 2, 3], | array([[1, 2, 3], | ||
[4, 5, 6]) | [4, 5, 6]) | ||
</font> | </font> | ||
Note that <code>a[:2]</code> and <code>a[0:2]</code> are equivalent. | |||
Individual elements can be accessed recursively: | Individual elements can be accessed recursively: | ||
Line 299: | Line 317: | ||
assert a[0, 1] == 2 | assert a[0, 1] == 2 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Multiple slices can be passed like you pass multiple indices: <code>a[1:3, 2:]</code>. | |||
In the following case, multiple rows and multiple columns are selected: | |||
<syntaxhighlight lang='py'> | |||
a[:2, 1:] | |||
</syntaxhighlight> | |||
<font size=-2> | |||
1 2 | |||
┌───┬───┬───┐ | |||
0 │ │ 2 │ 3 │ | |||
├───┼───┼───┤ | |||
1 │ │ 5 │ 6 │ | |||
├───┼───┼───┤ | |||
│ │ │ │ | |||
└───┴───┴───┘ | |||
array([[2, 3], | |||
[5, 6]) | |||
</font> | |||
The shape is <code>(2, 2)</code>. | |||
In this case, multiple (all) rows, but just one column is selected: | |||
<syntaxhighlight lang='py'> | |||
a[:, :1] | |||
</syntaxhighlight> | |||
<font size=-2> | |||
0 | |||
┌───┬───┬───┐ | |||
0 │ 1 │ │ │ | |||
├───┼───┼───┤ | |||
1 │ 4 │ │ │ | |||
├───┼───┼───┤ | |||
2 │ 7 │ │ │ | |||
└───┴───┴───┘ | |||
array([[1], | |||
[4], | |||
[7]]) | |||
</font> | |||
The shape is <code>(3, 1)</code>. | |||
<font color=darkkhaki>Interestingly, if we select multiple rows, but the second argument is an index, not a slice, we, don't get a "column", but a row. Why?</font> | |||
<syntaxhighlight lang='py'> | |||
a[:, 0] | |||
</syntaxhighlight> | |||
<font size=-2> | |||
array([1, 4, 7]) | |||
</font> | |||
==Boolean Indexing== | |||
{{Internal|NumPy_Boolean_Array_Indexing#Overview|Boolean Indexing}} | |||
==Fancy Indexing== | |||
{{Internal|NumPy_Fancy_Array_Indexing#Overview|Fancy Indexing}} | |||
=Array Methods= | =Array Methods= | ||
Line 304: | Line 375: | ||
Can be used with [[#Copy|slices]] to make a copy of the underlying data, instead of offering direct access to the storage of the source array. | Can be used with [[#Copy|slices]] to make a copy of the underlying data, instead of offering direct access to the storage of the source array. | ||
==<tt>reshape()</tt>== | |||
<syntaxhighlight lang='py'> | |||
np.arange(32).reshape((8, 4)) | |||
</syntaxhighlight> | |||
<font size=-2> | |||
array([[ 0, 1, 2, 3], | |||
[ 4, 5, 6, 7], | |||
[ 8, 9, 10, 11], | |||
[12, 13, 14, 15], | |||
[16, 17, 18, 19], | |||
[20, 21, 22, 23], | |||
[24, 25, 26, 27], | |||
[28, 29, 30, 31]]) | |||
</font> | |||
==<tt>transpose()</tt>== | |||
See [[#transpose|Transposing Arrays]] below. | |||
==<tt>swapaxes()</tt>== | |||
<code>swapaxes()</code> takes a pair of axis numbers and switches the indicated axes to rearrange the data. <code>swapaxes()</code> returns a view of the data without making a copy. | |||
=Array Arithmetic= | =Array Arithmetic= | ||
Line 333: | Line 423: | ||
Comparisons between arrays of the same shape yield Boolean arrays of the same shape. | Comparisons between arrays of the same shape yield Boolean arrays of the same shape. | ||
===Vectorized Comparison=== | |||
Like arithmetic operations, comparisons (such as <code>==</code>) is vectorized. Applying such a comparison on an array results in a boolean array: | |||
<syntaxhighlight lang='py'> | |||
a = np.array(['A', 'B', 'C', 'A', 'A' ,'D']) | |||
a == 'A' | |||
</syntaxhighlight> | |||
<font size=-2> | |||
array([ True, False, False, True, True, False]) | |||
</font> | |||
Additional variations of the comparison syntax: | |||
<syntaxhighlight lang='py'> | |||
a != 'A' | |||
</syntaxhighlight> | |||
<syntaxhighlight lang='py'> | |||
a = np.array([1, 2, 3, 4, 5]) | |||
a > 3 | |||
</syntaxhighlight> | |||
<font size=-2> | |||
array([False, False, False, True, True]) | |||
</font> | |||
Note that these boolean arrays can be used in boolean indexing: {{Internal|NumPy_Boolean_Array_Indexing#Overview|Boolean Indexing}} | |||
==Transposing Arrays== | ==Transposing Arrays== | ||
Transposing is a special form of reshaping that returns a '''view''' of the underlying data, without copying anything. | |||
<span id='transpose'></span>Arrays have a <code>transpose()</code> method. | |||
They also have a <code>T</code> attribute: | |||
<syntaxhighlight lang='py'> | |||
a = np.arange(32).reshape((8, 4)) | |||
</syntaxhighlight> | |||
<font size=-2> | |||
array([[ 0, 1, 2, 3], | |||
[ 4, 5, 6, 7], | |||
[ 8, 9, 10, 11], | |||
[12, 13, 14, 15], | |||
[16, 17, 18, 19], | |||
[20, 21, 22, 23], | |||
[24, 25, 26, 27], | |||
[28, 29, 30, 31]]) | |||
</font> | |||
<syntaxhighlight lang='py'> | |||
a.T | |||
</syntaxhighlight> | |||
<font size=-2> | |||
array([[ 0, 4, 8, 12, 16, 20, 24, 28], | |||
[ 1, 5, 9, 13, 17, 21, 25, 29], | |||
[ 2, 6, 10, 14, 18, 22, 26, 30], | |||
[ 3, 7, 11, 15, 19, 23, 27, 31]]) | |||
</font> | |||
<font color=darkkhaki>What is the difference between <code>T</code> and <code>transpose()</code>?</font> | |||
==Matrix Multiplication== | |||
<syntaxhighlight lang='py'> | |||
A = ... | |||
B = ... | |||
np.dot(A, B) | |||
A @ B | |||
</syntaxhighlight> | |||
==Swapping Axes== | ==Swapping Axes== | ||
See <code>[[#swapaxes()|swapaxes()]]</code> above. | |||
==Universal Functions== | ==Universal Functions== | ||
{{Internal|NumPy_Universal_Functions#Overview|Universal Functions}} | {{Internal|NumPy_Universal_Functions#Overview|Universal Functions}} |
Latest revision as of 17:30, 21 May 2024
Internal
Overview
ndarray
is an N-dimensional array object. It is a fast, flexible container for large datasets in Python. It is used to implement a Pandas Series. It allows performing mathematical operations on whole blocks of data using similar syntax to the equivalent operation between scalar elements. It also allows applying same mathematical operation, or function, to all array elements without the need to write loops. This approach is called vectorization. Evaluating operations between differently sized arrays is called broadcasting. Examples are provided in Array Arithmetic section.
ndarray
s are homogeneous, all elements of an ndarray
instance have the same data type. The data type is exposed by the array's dtype
attribute. The array's dimensions are exposed by the shape
attribute.
ndarray
s can be created by converting Python data structures, using generators, or initializing blocks of memory of specified shape with specified values. Once created, array sections can be selected with indexing and slicing syntax.
ndarray Geometry
The number of dimensions is reported by the ndim
attribute.
Shape. Each array also has a shape
tuple that indicates the sizes of each dimensions, along the axes 0, 1, 2, etc. The length of the shape
tuple is equal with ndim
import numpy as np
a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]])
a.ndim
2
a.shape
(3, 4)
Axes. It is useful to think of axis 0 as "rows" and axis 1 as "columns":
a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]]) row 0 row 1 row 2 Axis 1 ┌──────────────────────────────────▶ │ 0 1 2 3 │ ┌─────┬─────┬─────┬─────┐ │ │ A │ B │ C │ D │ │ a0 0 │ a0,0│ a0,1│ a0,2│ a0,3│ │ ├─────┼─────┼─────┼─────┤ │ │ E │ F │ G │ H │ Axis 0 │ a1 1 │ a1,0│ a1,1│ a1,2│ a1,3│ │ ├─────┼─────┼─────┼─────┤ │ │ I │ J │ K │ L │ │ a2 2 │ a2,0│ a2,1│ a2,2│ a2,3│ │ └─────┴─────┴─────┴─────┘ │ ▼
In multi-dimensional arrays, if you omit later indices, the returned object will be a lower dimensional array consisting of all the data along the other dimensions, which is stored in a contiguous area in memory.
a = np.array([[['A', 'B'], ['C', 'D']], [['E', 'F'], ['G', 'H']]]) # a 2 x 2 x 2 array
a[0] # the first contiguous 2 x 2 array:
array([['A', 'B'], ['C', 'D']], dtype='<U1')
ndarray Creation
Convert Python Data Structures with array() and asarray()
The np.array()
function takes Python data structures, such as lists, lists of list, tuples, and other sequence types and generates the ndarray
of the corresponding shape. By default, it copies the input data. For example, a bi-dimensional 3 x 3 ndarray
can be created by providing a list of 3 lists, each of the enclosed lists containing 3 elements:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
The Python data structures provided as arguments to np.array()
provide the array's geometry: nested sequences will converted to multidimensional arrays. If the structure is irregular, an "inhomogeneous" error message will be thrown. Unless explicitly provided as argument of the function, array()
tries to infer a good data type for the array it creates.
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)
The data structure to generate the array from must be provided as the first argument.
To enforce a specific data type:
a = np.array(..., np.float64))
asarray()
is similar to array()
with the exception that it does not copy the input if already an ndarray
.
By Specifying Shape and Value
zeros(), zeros_like()
To create an array of a specific shape filled with floating-point zeroes:
a = np.zeros((2, 3))
array([[0., 0., 0.], [0., 0., 0.]])
zeros_like()
takes another array and produces a zeros
array of the same shape and data type:
a = np.array([5, 10])
b = np.zeros_like(a)
array([[0, 0]) # the dtype is dtype('int64')
ones(), ones_like()
To create an array of a specific shape filled with floating-point ones:
a = np.ones((1, 2))
array(1., 1.)
ones_like()
takes another array and produces a ones
array of the same shape and data type:
a = np.array([5, 10])
b = np.ones_like(a)
array([[1, 1]) # the dtype is dtype('int64')
empty(), empty_like()
numpy.empty()
creates an array with the given shape without initializing the memory to any particular value. You should not rely on values present in such an array, and you should only use the function if you indent to explicitly initialize the array.
full(), full_like()
Produce an array of the given shape and data type with all values set to a given value.
a = np.full((2, 3), 5)
array([[5, 5, 5], [5, 5, 5]])
b = np.full_like(a, 6)
array([[6, 6, 6], [6, 6, 6]])
eye(), identity()
Creates a square N x N identity matrix:
a = np.eye(5)
array([[1., 0., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]])
arange()
The arange()
function is built upon the Python range()
function. It returns a unidimensional array populated with the output of a function equivalent with range()
.
With Generators
Element Data Type
Data types are a source of NumPy's flexibility for interacting with data coming from other systems. In most cases, they provide a mapping directly onto an underlying disk or memory representation, which makes it possible to read and write binary streams of data to disk. The numerical data types are named the same way: a type name like float
or int
, followed by a number indicating a number of bits per element.
The dtype
ndarray
attribute describes the data type of the array. Since ndarray
s are homogeneous, all elements have the same data type.
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
a.dtype
dtype('int64')
The class representing a specific data type can be created with:
dt = np.dtype('float64')
It is also declared in the numpy
namespace:
assert np.float64 == np.dtype('float64')
Data Types
Type | Type Code | Description |
---|---|---|
int8, uint8 | i1, u1 | Signed/unsigned 1 byte integer |
int16, uint16 | i2, u2 | Signed/unsigned 2 byte integer |
int32, uint32 | i4, u4 | Signed/unsigned 4 byte integer |
int64, uint64 | i8, u8 | Signed/unsigned 8 byte integer |
float16 | f2 | |
float32 | f4, f | |
float64 | f8, d | |
float128 | f16, g | |
complex64, complex128, complex256 | ||
bool | ? | |
object | O | |
string_ | S | String data in NumPy is fixed size and may truncate input without warning. |
unicode_ | U |
Casting an Array to a Different Data Type
Casting can be performed with the array method astype()
. Calling astype()
always creates a new array, and makes a copy of the data, even if the new data type is the same as the old data type. The conversion is NOT performed in place.
a = np.ones((1))
assert a.dtype == np.float64
b = a.astype(np.int64)
array([1])
If casting were to fail because the conversation cannot be done, a ValueError
exception will be raised.
An array of strings represented numbers can be converted to numeric form with astype()
:
a = np.array(["5.3", "1.1", "10.3"])
b = a.astype(np.float64)
assert b.dtype == np.float64
array([ 5.3, 1.1, 10.3])
Array Indexing and Slicing
By indexing we mean selecting a subset of an array using the index operator [...]
and a single numeric index. By slicing we mean selecting a subset of an array using the index operator [...]
and slice expression A:B
. Indices and slice expressions can be combined within the same index operator.
Unidimensional Array Slices
With unidimensional arrays, the slice operator selects elements similarly to the Python slice operator:
a = np.array([1, 2, 3, 4, 5])
b = a[1:4]
array([2, 3, 4])
However, unlike Python list slices, which are copies on the underlying list, NumPy ndarray
slices are a view into the original array, providing direct access to the underlying array. The data is not copied, and any modification to the view will be reflected in the array. The reason lies in the fact that NumPy has been designed to be able to work with very large arrays, so avoiding copying data is part of this approach.
To make a copy of the underlying data, invoke the copy() method on the slice:
a = np.ones((3), np.int64)
b = a[0:2].copy()
b[0] = 10
assert a[0] == 1 # the underlying array has not been changed
Assigning a scalar value to a slice propagates (broadcasts) that value to the entire selection.
a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[1:4] = 2 # array([1, 2, 2, 2, 1])
The bare slice [:]
will assign to all values in an array:
a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[:] = 2 # array([2, 2, 2, 2, 2])
Multi-dimensional Array Indexing and Slicing
For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. The slice selects a range element along an axis. A colon by itself (:
) means an entire axis. Each selected element of the slice is a smaller-dimension component.
In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
It is useful to think about the following slice expression as a slice of rows ("select the first two rows of the array"):
a[:2]
┌───┬───┬───┐ 0 │ 1 │ 2 │ 3 │ ├───┼───┼───┤ 1 │ 4 │ 5 │ 6 │ ├───┼───┼───┤ │ │ │ │ └───┴───┴───┘ array([[1, 2, 3], [4, 5, 6])
Note that a[:2]
and a[0:2]
are equivalent.
Individual elements can be accessed recursively:
assert a[0][1] == 2
An equivalent notation uses commas to separate indices:
assert a[0, 1] == 2
Multiple slices can be passed like you pass multiple indices: a[1:3, 2:]
.
In the following case, multiple rows and multiple columns are selected:
a[:2, 1:]
1 2 ┌───┬───┬───┐ 0 │ │ 2 │ 3 │ ├───┼───┼───┤ 1 │ │ 5 │ 6 │ ├───┼───┼───┤ │ │ │ │ └───┴───┴───┘ array([[2, 3], [5, 6])
The shape is (2, 2)
.
In this case, multiple (all) rows, but just one column is selected:
a[:, :1]
0 ┌───┬───┬───┐ 0 │ 1 │ │ │ ├───┼───┼───┤ 1 │ 4 │ │ │ ├───┼───┼───┤ 2 │ 7 │ │ │ └───┴───┴───┘ array([[1], [4], [7]])
The shape is (3, 1)
.
Interestingly, if we select multiple rows, but the second argument is an index, not a slice, we, don't get a "column", but a row. Why?
a[:, 0]
array([1, 4, 7])
Boolean Indexing
Fancy Indexing
Array Methods
copy()
Can be used with slices to make a copy of the underlying data, instead of offering direct access to the storage of the source array.
reshape()
np.arange(32).reshape((8, 4))
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])
transpose()
See Transposing Arrays below.
swapaxes()
swapaxes()
takes a pair of axis numbers and switches the indicated axes to rearrange the data. swapaxes()
returns a view of the data without making a copy.
Array Arithmetic
Vectorization
Any arithmetic operation between equal-size arrays applies the operation element-wise:
a = np.full((2, 3), 2)
b = np.full((2, 3), 3)
a + b
array([[5, 5, 5], [5, 5, 5]])
Arithmetic operations with scalars propagate the scalar argument to each element in the array:
a = np.full((2, 3), 2)
2 * a
array([[4, 4, 4], [4, 4, 4]])
Comparisons between arrays of the same shape yield Boolean arrays of the same shape.
Vectorized Comparison
Like arithmetic operations, comparisons (such as ==
) is vectorized. Applying such a comparison on an array results in a boolean array:
a = np.array(['A', 'B', 'C', 'A', 'A' ,'D'])
a == 'A'
array([ True, False, False, True, True, False])
Additional variations of the comparison syntax:
a != 'A'
a = np.array([1, 2, 3, 4, 5])
a > 3
array([False, False, False, True, True])
Note that these boolean arrays can be used in boolean indexing:
Transposing Arrays
Transposing is a special form of reshaping that returns a view of the underlying data, without copying anything.
Arrays have a transpose()
method.
They also have a T
attribute:
a = np.arange(32).reshape((8, 4))
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])
a.T
array([[ 0, 4, 8, 12, 16, 20, 24, 28], [ 1, 5, 9, 13, 17, 21, 25, 29], [ 2, 6, 10, 14, 18, 22, 26, 30], [ 3, 7, 11, 15, 19, 23, 27, 31]])
What is the difference between T
and transpose()
?
Matrix Multiplication
A = ...
B = ...
np.dot(A, B)
A @ B
Swapping Axes
See swapaxes()
above.