NumPy ndarray: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(23 intermediate revisions by the same user not shown)
Line 12: Line 12:
The number of dimensions is reported by the <code>ndim</code> attribute.
The number of dimensions is reported by the <code>ndim</code> attribute.


<span id='shape'></span>Each array also has a <code>shape</code> tuple that indicates the sizes of each dimensions, along the [[#Axes|axes]] 0, 1, 2, etc. The length of the <code>shape</code> tuple is equal with <code>ndim</code>
<span id='shape'></span>'''Shape'''. Each array also has a <code>shape</code> tuple that indicates the sizes of each dimensions, along the [[#Axes|axes]] 0, 1, 2, etc. The length of the <code>shape</code> tuple is equal with <code>ndim</code>
<syntaxhighlight lang='py'>
<syntaxhighlight lang='py'>
import numpy as np
import numpy as np
Line 25: Line 25:
</syntaxhighlight>
</syntaxhighlight>


<span id='Axes'></span>'''Axes''': It is useful to think of axis 0 as "rows" and axis 1 as "columns":
<span id='Axes'></span><span id='Axis'></span>'''Axes'''. It is useful to think of axis 0 as "rows" and axis 1 as "columns":


<font size=-2>
<font size=-2>
Line 241: Line 241:


=Array Indexing and Slicing=
=Array Indexing and Slicing=
Array indexing is selecting a subset of the array, or individual elements, with the index operator <code>[...]</code>.
By '''indexing''' we mean selecting a subset of an array using the index operator <code>[...]</code> and a single numeric index. By '''slicing''' we mean selecting a subset of an array using the index operator <code>[...]</code> and slice expression <code>A:B</code>. Indices and slice expressions can be combined within the same index operator.


==Unidimensional Array Slices==  
==Unidimensional Array Slices==  
Line 277: Line 277:
</syntaxhighlight>
</syntaxhighlight>


==Multi-dimensional Array Slices==
==Multi-dimensional Array Indexing and Slicing==
For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. Each selected element of the slice is a smaller-dimension component.
For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. The slice selects a range element along an [[#Axis|axis]]. A colon by itself (<code>:</code>) means an entire axis. Each selected element of the slice is a smaller-dimension component.


In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices:
In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices:
Line 290: Line 290:
         [7, 8, 9]])
         [7, 8, 9]])
</font>
</font>
It is useful to think about the following slice expression as a slice of rows, more specifically, "select the first two rows of the array"
It is useful to think about the following slice expression as a slice of rows ("select the first two rows of the array"):
<syntaxhighlight lang='py'>
<syntaxhighlight lang='py'>
a[:2] # slice of rows
a[:2]
</syntaxhighlight>
</syntaxhighlight>
<font size=-2>
<font size=-2>
Line 318: Line 318:
</syntaxhighlight>
</syntaxhighlight>


Multiple slices can be passed like you pass multiple indices.
Multiple slices can be passed like you pass multiple indices: <code>a[1:3, 2:]</code>.


In the following case, multiple rows and multiple columns are selected:
In the following case, multiple rows and multiple columns are selected:
Line 357: Line 357:
         [7]])
         [7]])
</font>
</font>
The shape is <code>(3, 1)</code>.


<font color=darkkhaki>Interestingly, if we select multiple rows, but the second argument is an index, not a slice, we, don't get a "column", but a row. Why?</font>
<font color=darkkhaki>Interestingly, if we select multiple rows, but the second argument is an index, not a slice, we, don't get a "column", but a row. Why?</font>
Line 365: Line 366:
  array([1, 4, 7])
  array([1, 4, 7])
</font>
</font>
==Boolean Indexing==
{{Internal|NumPy_Boolean_Array_Indexing#Overview|Boolean Indexing}}
==Fancy Indexing==
{{Internal|NumPy_Fancy_Array_Indexing#Overview|Fancy Indexing}}


=Array Methods=
=Array Methods=
Line 370: Line 375:


Can be used with [[#Copy|slices]] to make a copy of the underlying data, instead of offering direct access to the storage of the source array.
Can be used with [[#Copy|slices]] to make a copy of the underlying data, instead of offering direct access to the storage of the source array.
==<tt>reshape()</tt>==
<syntaxhighlight lang='py'>
np.arange(32).reshape((8, 4))
</syntaxhighlight>
<font size=-2>
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]])
</font>
==<tt>transpose()</tt>==
See [[#transpose|Transposing Arrays]] below.
==<tt>swapaxes()</tt>==
<code>swapaxes()</code> takes a pair of axis numbers and switches the indicated axes to rearrange the data. <code>swapaxes()</code> returns a view of the data without making a copy.


=Array Arithmetic=
=Array Arithmetic=
Line 399: Line 423:


Comparisons between arrays of the same shape yield Boolean arrays of the same shape.
Comparisons between arrays of the same shape yield Boolean arrays of the same shape.
===Vectorized Comparison===
Like arithmetic operations, comparisons (such as <code>==</code>) is vectorized. Applying such a comparison on an array results in a boolean array:
<syntaxhighlight lang='py'>
a = np.array(['A', 'B', 'C', 'A', 'A' ,'D'])
a == 'A'
</syntaxhighlight>
<font size=-2>
array([ True, False, False,  True,  True, False])
</font>
Additional variations of the comparison syntax:
<syntaxhighlight lang='py'>
a != 'A'
</syntaxhighlight>
<syntaxhighlight lang='py'>
a = np.array([1, 2, 3, 4, 5])
a > 3
</syntaxhighlight>
<font size=-2>
array([False, False, False,  True,  True])
</font>
Note that these boolean arrays can be used in boolean indexing: {{Internal|NumPy_Boolean_Array_Indexing#Overview|Boolean Indexing}}


==Transposing Arrays==
==Transposing Arrays==
Transposing is a special form of reshaping that returns a '''view''' of the underlying data, without copying anything.
<span id='transpose'></span>Arrays have a <code>transpose()</code> method.
They also have a <code>T</code> attribute:
<syntaxhighlight lang='py'>
a = np.arange(32).reshape((8, 4))
</syntaxhighlight>
<font size=-2>
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]])
</font>
<syntaxhighlight lang='py'>
a.T
</syntaxhighlight>
<font size=-2>
array([[ 0,  4,  8, 12, 16, 20, 24, 28],
        [ 1,  5,  9, 13, 17, 21, 25, 29],
        [ 2,  6, 10, 14, 18, 22, 26, 30],
        [ 3,  7, 11, 15, 19, 23, 27, 31]])
</font>
<font color=darkkhaki>What is the difference between <code>T</code> and <code>transpose()</code>?</font>
==Matrix Multiplication==
<syntaxhighlight lang='py'>
A = ...
B = ...
np.dot(A, B)
A @ B
</syntaxhighlight>
==Swapping Axes==
==Swapping Axes==
See <code>[[#swapaxes()|swapaxes()]]</code> above.
==Universal Functions==
==Universal Functions==
{{Internal|NumPy_Universal_Functions#Overview|Universal Functions}}
{{Internal|NumPy_Universal_Functions#Overview|Universal Functions}}

Latest revision as of 17:30, 21 May 2024

Internal

Overview

ndarray is an N-dimensional array object. It is a fast, flexible container for large datasets in Python. It is used to implement a Pandas Series. It allows performing mathematical operations on whole blocks of data using similar syntax to the equivalent operation between scalar elements. It also allows applying same mathematical operation, or function, to all array elements without the need to write loops. This approach is called vectorization. Evaluating operations between differently sized arrays is called broadcasting. Examples are provided in Array Arithmetic section.

ndarrays are homogeneous, all elements of an ndarray instance have the same data type. The data type is exposed by the array's dtype attribute. The array's dimensions are exposed by the shape attribute.

ndarrays can be created by converting Python data structures, using generators, or initializing blocks of memory of specified shape with specified values. Once created, array sections can be selected with indexing and slicing syntax.

ndarray Geometry

The number of dimensions is reported by the ndim attribute.

Shape. Each array also has a shape tuple that indicates the sizes of each dimensions, along the axes 0, 1, 2, etc. The length of the shape tuple is equal with ndim

import numpy as np

a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]])

a.ndim
2

a.shape
(3, 4)

Axes. It is useful to think of axis 0 as "rows" and axis 1 as "columns":

a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]])
                 row 0          row 1         row 2

    
                       Axis 1
         ┌──────────────────────────────────▶
         │         0     1     2    3                   
         │      ┌─────┬─────┬─────┬─────┐
         │      │  A  │  B  │  C  │  D  │ 
         │ a0 0 │ a0,0│ a0,1│ a0,2│ a0,3│
         │      ├─────┼─────┼─────┼─────┤
         │      │  E  │  F  │  G  │  H  │ 
  Axis 0 │ a1 1 │ a1,0│ a1,1│ a1,2│ a1,3│ 
         │      ├─────┼─────┼─────┼─────┤
         │      │  I  │  J  │  K  │  L  │
         │ a2 2 │ a2,0│ a2,1│ a2,2│ a2,3│ 
         │      └─────┴─────┴─────┴─────┘
         │         
         ▼

In multi-dimensional arrays, if you omit later indices, the returned object will be a lower dimensional array consisting of all the data along the other dimensions, which is stored in a contiguous area in memory.

a = np.array([[['A', 'B'], ['C', 'D']], [['E', 'F'], ['G', 'H']]]) # a 2 x 2 x 2 array
a[0] # the first contiguous 2 x 2 array:

array([['A', 'B'],
       ['C', 'D']], dtype='<U1')

ndarray Creation

Convert Python Data Structures with array() and asarray()

The np.array() function takes Python data structures, such as lists, lists of list, tuples, and other sequence types and generates the ndarray of the corresponding shape. By default, it copies the input data. For example, a bi-dimensional 3 x 3 ndarray can be created by providing a list of 3 lists, each of the enclosed lists containing 3 elements:

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

The Python data structures provided as arguments to np.array() provide the array's geometry: nested sequences will converted to multidimensional arrays. If the structure is irregular, an "inhomogeneous" error message will be thrown. Unless explicitly provided as argument of the function, array() tries to infer a good data type for the array it creates.

array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)

The data structure to generate the array from must be provided as the first argument.

To enforce a specific data type:

a = np.array(..., np.float64))

asarray() is similar to array() with the exception that it does not copy the input if already an ndarray.

By Specifying Shape and Value

zeros(), zeros_like()

To create an array of a specific shape filled with floating-point zeroes:

a = np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

zeros_like() takes another array and produces a zeros array of the same shape and data type:

a = np.array([5, 10])
b = np.zeros_like(a)

array([[0, 0]) # the dtype is dtype('int64')

ones(), ones_like()

To create an array of a specific shape filled with floating-point ones:

a = np.ones((1, 2))

array(1., 1.)

ones_like() takes another array and produces a ones array of the same shape and data type:

a = np.array([5, 10])
b = np.ones_like(a)

array([[1, 1]) # the dtype is dtype('int64')

empty(), empty_like()

numpy.empty() creates an array with the given shape without initializing the memory to any particular value. You should not rely on values present in such an array, and you should only use the function if you indent to explicitly initialize the array.

full(), full_like()

Produce an array of the given shape and data type with all values set to a given value.

a = np.full((2, 3), 5)

array([[5, 5, 5],
       [5, 5, 5]])

b = np.full_like(a, 6)

array([[6, 6, 6],
       [6, 6, 6]])

eye(), identity()

Creates a square N x N identity matrix:

a = np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

arange()

The arange() function is built upon the Python range() function. It returns a unidimensional array populated with the output of a function equivalent with range().

With Generators

Element Data Type

Data types are a source of NumPy's flexibility for interacting with data coming from other systems. In most cases, they provide a mapping directly onto an underlying disk or memory representation, which makes it possible to read and write binary streams of data to disk. The numerical data types are named the same way: a type name like float or int, followed by a number indicating a number of bits per element.

The dtype ndarray attribute describes the data type of the array. Since ndarrays are homogeneous, all elements have the same data type.

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]])

a.dtype
dtype('int64')

The class representing a specific data type can be created with:

dt = np.dtype('float64')

It is also declared in the numpy namespace:

assert np.float64 == np.dtype('float64')

Data Types

Type Type Code Description
int8, uint8 i1, u1 Signed/unsigned 1 byte integer
int16, uint16 i2, u2 Signed/unsigned 2 byte integer
int32, uint32 i4, u4 Signed/unsigned 4 byte integer
int64, uint64 i8, u8 Signed/unsigned 8 byte integer
float16 f2
float32 f4, f
float64 f8, d
float128 f16, g
complex64, complex128, complex256
bool ?
object O
string_ S String data in NumPy is fixed size and may truncate input without warning.
unicode_ U

Casting an Array to a Different Data Type

Casting can be performed with the array method astype(). Calling astype() always creates a new array, and makes a copy of the data, even if the new data type is the same as the old data type. The conversion is NOT performed in place.

a = np.ones((1))
assert a.dtype == np.float64
b = a.astype(np.int64)

array([1])

If casting were to fail because the conversation cannot be done, a ValueError exception will be raised.

An array of strings represented numbers can be converted to numeric form with astype():

a = np.array(["5.3", "1.1", "10.3"])
b = a.astype(np.float64)
assert b.dtype == np.float64

array([ 5.3,  1.1, 10.3])

Array Indexing and Slicing

By indexing we mean selecting a subset of an array using the index operator [...] and a single numeric index. By slicing we mean selecting a subset of an array using the index operator [...] and slice expression A:B. Indices and slice expressions can be combined within the same index operator.

Unidimensional Array Slices

With unidimensional arrays, the slice operator selects elements similarly to the Python slice operator:

a = np.array([1, 2, 3, 4, 5])
b = a[1:4]

array([2, 3, 4])

However, unlike Python list slices, which are copies on the underlying list, NumPy ndarray slices are a view into the original array, providing direct access to the underlying array. The data is not copied, and any modification to the view will be reflected in the array. The reason lies in the fact that NumPy has been designed to be able to work with very large arrays, so avoiding copying data is part of this approach.

To make a copy of the underlying data, invoke the copy() method on the slice:

a = np.ones((3), np.int64)
b = a[0:2].copy()
b[0] = 10
assert a[0] == 1 # the underlying array has not been changed

Assigning a scalar value to a slice propagates (broadcasts) that value to the entire selection.

a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[1:4] = 2                 # array([1, 2, 2, 2, 1])

The bare slice [:] will assign to all values in an array:

a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[:] = 2                   # array([2, 2, 2, 2, 2])

Multi-dimensional Array Indexing and Slicing

For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. The slice selects a range element along an axis. A colon by itself (:) means an entire axis. Each selected element of the slice is a smaller-dimension component.

In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices:

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

It is useful to think about the following slice expression as a slice of rows ("select the first two rows of the array"):

a[:2]

  ┌───┬───┬───┐
0 │ 1 │ 2 │ 3 │     
  ├───┼───┼───┤
1 │ 4 │ 5 │ 6 │     
  ├───┼───┼───┤
  │   │   │   │     
  └───┴───┴───┘

array([[1, 2, 3],
       [4, 5, 6])

Note that a[:2] and a[0:2] are equivalent.

Individual elements can be accessed recursively:

assert a[0][1] == 2

An equivalent notation uses commas to separate indices:

assert a[0, 1] == 2

Multiple slices can be passed like you pass multiple indices: a[1:3, 2:].

In the following case, multiple rows and multiple columns are selected:

a[:2, 1:]

        1   2  
  ┌───┬───┬───┐
0 │   │ 2 │ 3 │     
  ├───┼───┼───┤
1 │   │ 5 │ 6 │     
  ├───┼───┼───┤
  │   │   │   │     
  └───┴───┴───┘
 
array([[2, 3],
       [5, 6])

The shape is (2, 2).

In this case, multiple (all) rows, but just one column is selected:

a[:, :1]

    0 
  ┌───┬───┬───┐
0 │ 1 │   │   │     
  ├───┼───┼───┤
1 │ 4 │   │   │     
  ├───┼───┼───┤
2 │ 7 │   │   │     
  └───┴───┴───┘
 
array([[1],
       [4],
       [7]])

The shape is (3, 1).

Interestingly, if we select multiple rows, but the second argument is an index, not a slice, we, don't get a "column", but a row. Why?

a[:, 0]

array([1, 4, 7])

Boolean Indexing

Boolean Indexing

Fancy Indexing

Fancy Indexing

Array Methods

copy()

Can be used with slices to make a copy of the underlying data, instead of offering direct access to the storage of the source array.

reshape()

np.arange(32).reshape((8, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

transpose()

See Transposing Arrays below.

swapaxes()

swapaxes() takes a pair of axis numbers and switches the indicated axes to rearrange the data. swapaxes() returns a view of the data without making a copy.

Array Arithmetic

Vectorization

Any arithmetic operation between equal-size arrays applies the operation element-wise:

a = np.full((2, 3), 2)
b = np.full((2, 3), 3)

a + b

array([[5, 5, 5],
       [5, 5, 5]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

a = np.full((2, 3), 2)

2 * a

array([[4, 4, 4],
       [4, 4, 4]])

Comparisons between arrays of the same shape yield Boolean arrays of the same shape.

Vectorized Comparison

Like arithmetic operations, comparisons (such as ==) is vectorized. Applying such a comparison on an array results in a boolean array:

a = np.array(['A', 'B', 'C', 'A', 'A' ,'D'])
a == 'A'

array([ True, False, False,  True,  True, False])

Additional variations of the comparison syntax:

a != 'A'
a = np.array([1, 2, 3, 4, 5])
a > 3

array([False, False, False,  True,  True])

Note that these boolean arrays can be used in boolean indexing:

Boolean Indexing

Transposing Arrays

Transposing is a special form of reshaping that returns a view of the underlying data, without copying anything.

Arrays have a transpose() method.

They also have a T attribute:

a = np.arange(32).reshape((8, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

a.T

array([[ 0,  4,  8, 12, 16, 20, 24, 28],
       [ 1,  5,  9, 13, 17, 21, 25, 29],
       [ 2,  6, 10, 14, 18, 22, 26, 30],
       [ 3,  7, 11, 15, 19, 23, 27, 31]])

What is the difference between T and transpose()?

Matrix Multiplication

A = ...
B = ...
np.dot(A, B)
A @ B

Swapping Axes

See swapaxes() above.

Universal Functions

Universal Functions

Array-Oriented Programming

Conditional Logic as Array Operations

Mathematical and Statistical Operations

Sorting

Linear Algebra

File Input/Output with Arrays