NumPy ndarray: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(123 intermediate revisions by the same user not shown)
Line 2: Line 2:
* [[Numpy_Concepts#numpy|Numpy Concepts]]
* [[Numpy_Concepts#numpy|Numpy Concepts]]
=Overview=
=Overview=
<code>ndarray</code> is an N-dimensional array object. It is a fast, flexible container for large  datasets in Python. It is used to implement a Pandas [[Pandas_Series#Overview|Series]]. It allows performing mathematical operations on whole blocks of data using similar syntax to the equivalent operation between scalar elements. It also allows applying same mathematical operation, or function, to all array elements without the need to write loops. Examples are provided in [[#Array_Arithmetic|Array Arithmetic]] section.  
<code>ndarray</code> is an N-dimensional array object. It is a fast, flexible container for large  datasets in Python. It is used to implement a Pandas [[Pandas_Series#Overview|Series]]. It allows performing mathematical operations on whole blocks of data using similar syntax to the equivalent operation between scalar elements. It also allows applying same mathematical operation, or function, to all array elements without the need to write loops. This approach is called [[#Vectorization|vectorization]]. Evaluating operations between differently sized arrays is called broadcasting. Examples are provided in [[#Array_Arithmetic|Array Arithmetic]] section.  


<code>ndarray</code>s are homogeneous, all elements of an <code>ndarray</code> instance have the same [[#Data_Type|data type]]. <code>ndarray</code>s can be created by [[#Convert_Python_Data_Structures|converting Python data structures]], using [[NumPy_ndarray#With_Generators|generators]], or [[#By_Specifying_Shape_and_Value|initializing blocks of memory of specified shape with specified values]]. Once created, array sections can be selected with [[#Array_Indexing_and_Slicing|indexing and slicing syntax]].
<code>ndarray</code>s are homogeneous, all elements of an <code>ndarray</code> instance have the same [[#Data_Type|data type]]. The data type is exposed by the array's <code>[[#dtype|dtype]]</code> attribute. The array's dimensions are exposed by the <code>[[#shape|shape]]</code> attribute.
 
<code>ndarray</code>s can be created by [[#Convert_Python_Data_Structures|converting Python data structures]], using [[NumPy_ndarray#With_Generators|generators]], or [[#By_Specifying_Shape_and_Value|initializing blocks of memory of specified shape with specified values]]. Once created, array sections can be selected with [[#Array_Indexing_and_Slicing|indexing and slicing syntax]].


=<span id='Geometry'></span><tt>ndarray</tt> Geometry=
=<span id='Geometry'></span><tt>ndarray</tt> Geometry=


Each array has a <code>shape</code> tuple that indicates the sizes of each dimensions, and a <span id='dtype'></span><code>dtype</code> object that describes the data type of the array.
The number of dimensions is reported by the <code>ndim</code> attribute.
 
<span id='shape'></span>'''Shape'''. Each array also has a <code>shape</code> tuple that indicates the sizes of each dimensions, along the [[#Axes|axes]] 0, 1, 2, etc. The length of the <code>shape</code> tuple is equal with <code>ndim</code>
<syntaxhighlight lang='py'>
import numpy as np
 
a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]])
 
a.ndim
2
 
a.shape
(3, 4)
</syntaxhighlight>
 
<span id='Axes'></span><span id='Axis'></span>'''Axes'''. It is useful to think of axis 0 as "rows" and axis 1 as "columns":
 
<font size=-2>
 
a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]])
                  row 0          row 1        row 2
   
                        Axis 1
          ┌──────────────────────────────────▶
          │        0    1    2    3                 
          │      ┌─────┬─────┬─────┬─────┐
          │      │  A  │  B  │  C  │  D  │
          │ a<sub>0</sub> 0 │ a<sub>0,0</sub>│ a<sub>0,1</sub>│ a<sub>0,2</sub>│ a<sub>0,3</sub>│
          │      ├─────┼─────┼─────┼─────┤
          │      │  E  │  F  │  G  │  H  │
  Axis 0 │ a<sub>1</sub> 1 │ a<sub>1,0</sub>│ a<sub>1,1</sub>│ a<sub>1,2</sub>│ a<sub>1,3</sub>│
          │      ├─────┼─────┼─────┼─────┤
          │      │  I  │  J  │  K  │  L  │
          │ a<sub>2</sub> 2 │ a<sub>2,0</sub>│ a<sub>2,1</sub>│ a<sub>2,2</sub>│ a<sub>2,3</sub>
          │      └─────┴─────┴─────┴─────┘
          │       
          ▼
</font>
 
In multi-dimensional arrays, if you omit later indices, the returned object will be a lower dimensional array consisting of all the data along the other dimensions, which is stored in a contiguous area in memory.
 
<syntaxhighlight lang='py'>
a = np.array([[['A', 'B'], ['C', 'D']], [['E', 'F'], ['G', 'H']]]) # a 2 x 2 x 2 array
a[0] # the first contiguous 2 x 2 array:
</syntaxhighlight>
<font size=-2>
array([['A', 'B'],
        ['C', 'D']], dtype='<U1')
</font>


=<tt>ndarray</tt> Creation=
=<tt>ndarray</tt> Creation=
==Convert Python Data Structures==
==<span id='Convert_Python_Data_Structures'></span>Convert Python Data Structures with <tt>array()</tt> and <tt>asarray()</tt>==
The <code>np.array()</code> function takes Python data structures, such as lists, lists of list, tuples, etc. and generates the corresponding shape <code>ndarray</code>. For example, a bi-dimensional 3 x 3 <code>ndarray</code> can be created by providing a list of 3 lists, each of the enclosed lists containing 3 elements:
The <code>np.array()</code> function takes Python data structures, such as lists, lists of list, tuples, and other sequence types and generates the <code>ndarray</code> of the corresponding shape. By default, it copies the input data. For example, a bi-dimensional 3 x 3 <code>ndarray</code> can be created by providing a list of 3 lists, each of the enclosed lists containing 3 elements:


<syntaxhighlight lang='py'>
<syntaxhighlight lang='py'>
Line 24: Line 75:
         [7, 8, 9]])
         [7, 8, 9]])
</font>
</font>
The Python data structures provided as arguments to <code>np.array()</code> are interpreting according to the array's [[#Geometry|geometry]].
The Python data structures provided as arguments to <code>np.array()</code> provide the array's [[#Geometry|geometry]]: nested sequences will converted to  multidimensional arrays. If the structure is irregular, an "inhomogeneous" error message will be thrown. Unless explicitly provided as argument of the function, <code>array()</code> tries to infer a good data type for the array it creates.
 
<syntaxhighlight lang='py'>
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)
</syntaxhighlight>
The data structure to generate the array from must be provided as the first argument.
 
To enforce a specific [[#Element_Data_Type|data type]]:
<syntaxhighlight lang='py'>
a = np.array(..., np.float64))
</syntaxhighlight>
 
<code>asarray()</code> is similar to <code>array()</code> with the exception that it does not copy the input if already an <code>ndarray</code>.
 
==By Specifying Shape and Value==
===<tt>zeros(), zeros_like()</tt>===
To create an array of a specific [[#shape|shape]] filled with floating-point zeroes:
<syntaxhighlight lang='py'>
a = np.zeros((2, 3))
</syntaxhighlight>
<font size=-2>
array([[0., 0., 0.],
        [0., 0., 0.]])
</font>
 
<code>zeros_like()</code> takes another array and produces a <code>zeros</code> array of the same shape and data type:
<syntaxhighlight lang='py'>
a = np.array([5, 10])
b = np.zeros_like(a)
</syntaxhighlight>
<font size=-2>
array([[0, 0]) <font color=teal># the dtype is dtype('int64')</font>
</font>
 
===<tt>ones(), ones_like()</tt>===
To create an array of a specific [[#shape|shape]] filled with floating-point ones:
<syntaxhighlight lang='py'>
a = np.ones((1, 2))
</syntaxhighlight>
<font size=-2>
array([[1., 1.]])
</font>
<code>ones_like()</code> takes another array and produces a <code>ones</code> array of the same shape and data type:
<syntaxhighlight lang='py'>
a = np.array([5, 10])
b = np.ones_like(a)
</syntaxhighlight>
<font size=-2>
array([[1, 1]) <font color=teal># the dtype is dtype('int64')</font>
</font>
 
===<tt>empty(), empty_like()</tt>===
<code>numpy.empty()</code> creates an array with the given shape without initializing the memory to any particular value. You should not rely on values present in such an array, and you should only use the function if you indent to explicitly initialize the array.
===<tt>full(), full_like()</tt>===
Produce an array of the given shape and data type with all values set to a given value.
<syntaxhighlight lang='py'>
a = np.full((2, 3), 5)
</syntaxhighlight>
<font size=-2>
array([[5, 5, 5],
        [5, 5, 5]])
</font>
<syntaxhighlight lang='py'>
b = np.full_like(a, 6)
</syntaxhighlight>
<font size=-2>
array([[6, 6, 6],
        [6, 6, 6]])
</font>
===<tt>eye(), identity()</tt>===
Creates a square N x N identity matrix:
<syntaxhighlight lang='py'>
a = np.eye(5)
</syntaxhighlight>
<font size=-2>
array([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])
</font>
 
===<tt>arange()</tt>===
The <code>arange()</code> function is built upon the Python <code>[[Python_Language#Generate_Number_Sequences_with_range()|range()]]</code> function. It returns a unidimensional array populated with the output of a function equivalent with <code>range()</code>.


==With Generators==
==With Generators==
==By Specifying Shape and Value==


=<span id='Data_Type'></span>Element Data Type=
=<span id='Data_Type'></span>Element Data Type=
Data types are a source of NumPy's flexibility for interacting with data coming from other systems. In most cases, they provide a mapping directly onto an underlying disk or memory representation, which makes it possible to read and write binary streams of data to disk. The numerical data types are named the same way: a type name like <code>float</code> or <code>int</code>, followed by a number indicating a number of bits per element.
The <span id='dtype'></span><code>dtype</code> <code>ndarray</code> attribute describes the data type of the array. Since <code>ndarray</code>s are homogeneous, all elements have the same data type.
<syntaxhighlight lang='py'>
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
a.dtype
dtype('int64')
</syntaxhighlight>
The class representing a specific data type can be created with:
<syntaxhighlight lang='py'>
dt = np.dtype('float64')
</syntaxhighlight>
It is also declared in the <code>numpy</code> namespace:
<syntaxhighlight lang='py'>
assert np.float64 == np.dtype('float64')
</syntaxhighlight>
==Data Types==
{| class="wikitable" style="text-align: left;"
! Type
! Type Code
! Description
|-
| <font type=menlo>int8, uint8</font>  || <font type=menlo>i1, u1</font> || Signed/unsigned 1 byte integer
|-
| <font type=menlo>int16, uint16</font> || <font type=menlo>i2, u2</font> || Signed/unsigned 2 byte integer
|-
| <font type=menlo>int32, uint32</font> || <font type=menlo>i4, u4</font> || Signed/unsigned 4 byte integer
|-
| <font type=menlo>int64, uint64</font> || <font type=menlo>i8, u8</font> || Signed/unsigned 8 byte integer
|-
| <font type=menlo>float16</font> || <font type=menlo>f2</font> ||
|-
| <font type=menlo>float32</font> || <font type=menlo>f4, f</font> ||
|-
| <font type=menlo>float64</font> || <font type=menlo>f8, d</font> ||
|-
| <font type=menlo>float128</font> || <font type=menlo>f16, g</font> ||
|-
| <font type=menlo>complex64, complex128, complex256</font> ||  ||
|-
| <font type=menlo>bool</font> || <font type=menlo>?</font> ||
|-
| <font type=menlo>object</font> || <font type=menlo>O</font> ||
|-
| <font type=menlo>string_</font> || <font type=menlo>S</font> || String data in NumPy is fixed size and may truncate input without warning.
|-
| <font type=menlo>unicode_</font> || <font type=menlo>U</font> ||
|-
|}
==Casting an Array to a Different Data Type==
Casting can be performed with the array method <code>astype()</code>. Calling <code>astype()</code> always creates a new array, and makes a copy of the data, even if the new data type is the same as the old data type. The conversion is NOT performed in place.
<syntaxhighlight lang='py'>
a = np.ones((1))
assert a.dtype == np.float64
b = a.astype(np.int64)
</syntaxhighlight>
<font size=-2>
array([1])
</font>
If casting were to fail because the conversation cannot be done, a <code>ValueError</code> exception will be raised.
An array of strings represented numbers can be converted to numeric form with <code>astype()</code>:
<syntaxhighlight lang='py'>
a = np.array(["5.3", "1.1", "10.3"])
b = a.astype(np.float64)
assert b.dtype == np.float64
</syntaxhighlight>
<font size=-2>
array([ 5.3,  1.1, 10.3])
</font>
=Array Indexing and Slicing=
=Array Indexing and Slicing=
By '''indexing''' we mean selecting a subset of an array using the index operator <code>[...]</code> and a single numeric index. By '''slicing''' we mean selecting a subset of an array using the index operator <code>[...]</code> and slice expression <code>A:B</code>. Indices and slice expressions can be combined within the same index operator.
==Unidimensional Array Slices==
With unidimensional arrays, the slice operator selects elements similarly to the [[Slicing_Lists_and_Tuples_in_Python#Overview|Python slice operator]]:
<syntaxhighlight lang='py'>
a = np.array([1, 2, 3, 4, 5])
b = a[1:4]
</syntaxhighlight>
<font size=-2>
array([2, 3, 4])
</font>
However, unlike Python list slices, which [[Slicing_Lists_and_Tuples_in_Python#Slices_Are_Copies|are copies on the underlying list]], NumPy <code>ndarray</code> slices are a view into the original array, providing direct access to the underlying array. The data is not copied, and any modification to the view will be reflected in the array. The reason lies in the fact that NumPy has been designed to be able to work with very large arrays, so avoiding copying data is part of this approach.
<span id='Copy'></span>To make a '''copy of the underlying data''', invoke the [[#copy()|copy()]] method on the slice:
<syntaxhighlight lang='py'>
a = np.ones((3), np.int64)
b = a[0:2].copy()
b[0] = 10
assert a[0] == 1 # the underlying array has not been changed
</syntaxhighlight>
Assigning a scalar value to a slice '''propagates''' (broadcasts) that value to the entire selection.
<syntaxhighlight lang='py'>
a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[1:4] = 2                # array([1, 2, 2, 2, 1])
</syntaxhighlight>
The bare slice <code>[:]</code> will assign to all values in an array:
<syntaxhighlight lang='py'>
a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[:] = 2                  # array([2, 2, 2, 2, 2])
</syntaxhighlight>
==Multi-dimensional Array Indexing and Slicing==
For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. The slice selects a range element along an [[#Axis|axis]]. A colon by itself (<code>:</code>) means an entire axis. Each selected element of the slice is a smaller-dimension component.
In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices:
<syntaxhighlight lang='py'>
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
</syntaxhighlight>
<font size=-2>
array([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
</font>
It is useful to think about the following slice expression as a slice of rows ("select the first two rows of the array"):
<syntaxhighlight lang='py'>
a[:2]
</syntaxhighlight>
<font size=-2>
  ┌───┬───┬───┐
0 │ 1 │ 2 │ 3 │   
  ├───┼───┼───┤
1 │ 4 │ 5 │ 6 │   
  ├───┼───┼───┤
  │  │  │  │   
  └───┴───┴───┘
array([[1, 2, 3],
        [4, 5, 6])
</font>
Note that <code>a[:2]</code> and <code>a[0:2]</code> are equivalent.
Individual elements can be accessed recursively:
<syntaxhighlight lang='py'>
assert a[0][1] == 2
</syntaxhighlight>
An equivalent notation uses commas to separate indices:
<syntaxhighlight lang='py'>
assert a[0, 1] == 2
</syntaxhighlight>
Multiple slices can be passed like you pass multiple indices: <code>a[1:3, 2:]</code>.
In the following case, multiple rows and multiple columns are selected:
<syntaxhighlight lang='py'>
a[:2, 1:]
</syntaxhighlight>
<font size=-2>
        1  2 
  ┌───┬───┬───┐
0 │  │ 2 │ 3 │   
  ├───┼───┼───┤
1 │  │ 5 │ 6 │   
  ├───┼───┼───┤
  │  │  │  │   
  └───┴───┴───┘
 
array([[2, 3],
        [5, 6])
</font>
The shape is <code>(2, 2)</code>.
In this case, multiple (all) rows, but just one column is selected:
<syntaxhighlight lang='py'>
a[:, :1]
</syntaxhighlight>
<font size=-2>
    0
  ┌───┬───┬───┐
0 │ 1 │  │  │   
  ├───┼───┼───┤
1 │ 4 │  │  │   
  ├───┼───┼───┤
2 │ 7 │  │  │   
  └───┴───┴───┘
 
array([[1],
        [4],
        [7]])
</font>
The shape is <code>(3, 1)</code>.
<font color=darkkhaki>Interestingly, if we select multiple rows, but the second argument is an index, not a slice, we, don't get a "column", but a row. Why?</font>
<syntaxhighlight lang='py'>
a[:, 0]
</syntaxhighlight>
<font size=-2>
array([1, 4, 7])
</font>
==Boolean Indexing==
{{Internal|NumPy_Boolean_Array_Indexing#Overview|Boolean Indexing}}
==Fancy Indexing==
{{Internal|NumPy_Fancy_Array_Indexing#Overview|Fancy Indexing}}
=Array Methods=
==<tt>copy()</tt>==
Can be used with [[#Copy|slices]] to make a copy of the underlying data, instead of offering direct access to the storage of the source array.
==<tt>reshape()</tt>==
<syntaxhighlight lang='py'>
np.arange(32).reshape((8, 4))
</syntaxhighlight>
<font size=-2>
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]])
</font>
==<tt>transpose()</tt>==
See [[#transpose|Transposing Arrays]] below.
==<tt>swapaxes()</tt>==
<code>swapaxes()</code> takes a pair of axis numbers and switches the indicated axes to rearrange the data. <code>swapaxes()</code> returns a view of the data without making a copy.


=Array Arithmetic=
=Array Arithmetic=
==Vectorization==
Any arithmetic operation between equal-size arrays applies the operation element-wise:
<syntaxhighlight lang='py'>
a = np.full((2, 3), 2)
b = np.full((2, 3), 3)
a + b
</syntaxhighlight>
<font size=-2>
array([[5, 5, 5],
        [5, 5, 5]])
</font>
Arithmetic operations with scalars propagate the scalar argument to each element in the array:
<syntaxhighlight lang='py'>
a = np.full((2, 3), 2)
2 * a
</syntaxhighlight>
<font size=-2>
array([[4, 4, 4],
        [4, 4, 4]])
</font>
Comparisons between arrays of the same shape yield Boolean arrays of the same shape.
===Vectorized Comparison===
Like arithmetic operations, comparisons (such as <code>==</code>) is vectorized. Applying such a comparison on an array results in a boolean array:
<syntaxhighlight lang='py'>
a = np.array(['A', 'B', 'C', 'A', 'A' ,'D'])
a == 'A'
</syntaxhighlight>
<font size=-2>
array([ True, False, False,  True,  True, False])
</font>
Additional variations of the comparison syntax:
<syntaxhighlight lang='py'>
a != 'A'
</syntaxhighlight>
<syntaxhighlight lang='py'>
a = np.array([1, 2, 3, 4, 5])
a > 3
</syntaxhighlight>
<font size=-2>
array([False, False, False,  True,  True])
</font>
Note that these boolean arrays can be used in boolean indexing: {{Internal|NumPy_Boolean_Array_Indexing#Overview|Boolean Indexing}}


==Transposing Arrays==
==Transposing Arrays==
Transposing is a special form of reshaping that returns a '''view''' of the underlying data, without copying anything.
<span id='transpose'></span>Arrays have a <code>transpose()</code> method.
They also have a <code>T</code> attribute:
<syntaxhighlight lang='py'>
a = np.arange(32).reshape((8, 4))
</syntaxhighlight>
<font size=-2>
array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]])
</font>
<syntaxhighlight lang='py'>
a.T
</syntaxhighlight>
<font size=-2>
array([[ 0,  4,  8, 12, 16, 20, 24, 28],
        [ 1,  5,  9, 13, 17, 21, 25, 29],
        [ 2,  6, 10, 14, 18, 22, 26, 30],
        [ 3,  7, 11, 15, 19, 23, 27, 31]])
</font>
<font color=darkkhaki>What is the difference between <code>T</code> and <code>transpose()</code>?</font>
==Matrix Multiplication==
<syntaxhighlight lang='py'>
A = ...
B = ...
np.dot(A, B)
A @ B
</syntaxhighlight>
==Swapping Axes==
==Swapping Axes==
See <code>[[#swapaxes()|swapaxes()]]</code> above.
==Universal Functions==
==Universal Functions==
{{Internal|NumPy_Universal_Functions#Overview|Universal Functions}}
{{Internal|NumPy_Universal_Functions#Overview|Universal Functions}}
Line 43: Line 498:
==Sorting==
==Sorting==
==Linear Algebra==
==Linear Algebra==


=File Input/Output with Arrays=
=File Input/Output with Arrays=

Latest revision as of 17:30, 21 May 2024

Internal

Overview

ndarray is an N-dimensional array object. It is a fast, flexible container for large datasets in Python. It is used to implement a Pandas Series. It allows performing mathematical operations on whole blocks of data using similar syntax to the equivalent operation between scalar elements. It also allows applying same mathematical operation, or function, to all array elements without the need to write loops. This approach is called vectorization. Evaluating operations between differently sized arrays is called broadcasting. Examples are provided in Array Arithmetic section.

ndarrays are homogeneous, all elements of an ndarray instance have the same data type. The data type is exposed by the array's dtype attribute. The array's dimensions are exposed by the shape attribute.

ndarrays can be created by converting Python data structures, using generators, or initializing blocks of memory of specified shape with specified values. Once created, array sections can be selected with indexing and slicing syntax.

ndarray Geometry

The number of dimensions is reported by the ndim attribute.

Shape. Each array also has a shape tuple that indicates the sizes of each dimensions, along the axes 0, 1, 2, etc. The length of the shape tuple is equal with ndim

import numpy as np

a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]])

a.ndim
2

a.shape
(3, 4)

Axes. It is useful to think of axis 0 as "rows" and axis 1 as "columns":

a = np.array([[A, B, C, D], [E, F, G, H], [I, J, K, L]])
                 row 0          row 1         row 2

    
                       Axis 1
         ┌──────────────────────────────────▶
         │         0     1     2    3                   
         │      ┌─────┬─────┬─────┬─────┐
         │      │  A  │  B  │  C  │  D  │ 
         │ a0 0 │ a0,0│ a0,1│ a0,2│ a0,3│
         │      ├─────┼─────┼─────┼─────┤
         │      │  E  │  F  │  G  │  H  │ 
  Axis 0 │ a1 1 │ a1,0│ a1,1│ a1,2│ a1,3│ 
         │      ├─────┼─────┼─────┼─────┤
         │      │  I  │  J  │  K  │  L  │
         │ a2 2 │ a2,0│ a2,1│ a2,2│ a2,3│ 
         │      └─────┴─────┴─────┴─────┘
         │         
         ▼

In multi-dimensional arrays, if you omit later indices, the returned object will be a lower dimensional array consisting of all the data along the other dimensions, which is stored in a contiguous area in memory.

a = np.array([[['A', 'B'], ['C', 'D']], [['E', 'F'], ['G', 'H']]]) # a 2 x 2 x 2 array
a[0] # the first contiguous 2 x 2 array:

array([['A', 'B'],
       ['C', 'D']], dtype='<U1')

ndarray Creation

Convert Python Data Structures with array() and asarray()

The np.array() function takes Python data structures, such as lists, lists of list, tuples, and other sequence types and generates the ndarray of the corresponding shape. By default, it copies the input data. For example, a bi-dimensional 3 x 3 ndarray can be created by providing a list of 3 lists, each of the enclosed lists containing 3 elements:

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

The Python data structures provided as arguments to np.array() provide the array's geometry: nested sequences will converted to multidimensional arrays. If the structure is irregular, an "inhomogeneous" error message will be thrown. Unless explicitly provided as argument of the function, array() tries to infer a good data type for the array it creates.

array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)

The data structure to generate the array from must be provided as the first argument.

To enforce a specific data type:

a = np.array(..., np.float64))

asarray() is similar to array() with the exception that it does not copy the input if already an ndarray.

By Specifying Shape and Value

zeros(), zeros_like()

To create an array of a specific shape filled with floating-point zeroes:

a = np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

zeros_like() takes another array and produces a zeros array of the same shape and data type:

a = np.array([5, 10])
b = np.zeros_like(a)

array([[0, 0]) # the dtype is dtype('int64')

ones(), ones_like()

To create an array of a specific shape filled with floating-point ones:

a = np.ones((1, 2))

array(1., 1.)

ones_like() takes another array and produces a ones array of the same shape and data type:

a = np.array([5, 10])
b = np.ones_like(a)

array([[1, 1]) # the dtype is dtype('int64')

empty(), empty_like()

numpy.empty() creates an array with the given shape without initializing the memory to any particular value. You should not rely on values present in such an array, and you should only use the function if you indent to explicitly initialize the array.

full(), full_like()

Produce an array of the given shape and data type with all values set to a given value.

a = np.full((2, 3), 5)

array([[5, 5, 5],
       [5, 5, 5]])

b = np.full_like(a, 6)

array([[6, 6, 6],
       [6, 6, 6]])

eye(), identity()

Creates a square N x N identity matrix:

a = np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

arange()

The arange() function is built upon the Python range() function. It returns a unidimensional array populated with the output of a function equivalent with range().

With Generators

Element Data Type

Data types are a source of NumPy's flexibility for interacting with data coming from other systems. In most cases, they provide a mapping directly onto an underlying disk or memory representation, which makes it possible to read and write binary streams of data to disk. The numerical data types are named the same way: a type name like float or int, followed by a number indicating a number of bits per element.

The dtype ndarray attribute describes the data type of the array. Since ndarrays are homogeneous, all elements have the same data type.

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]])

a.dtype
dtype('int64')

The class representing a specific data type can be created with:

dt = np.dtype('float64')

It is also declared in the numpy namespace:

assert np.float64 == np.dtype('float64')

Data Types

Type Type Code Description
int8, uint8 i1, u1 Signed/unsigned 1 byte integer
int16, uint16 i2, u2 Signed/unsigned 2 byte integer
int32, uint32 i4, u4 Signed/unsigned 4 byte integer
int64, uint64 i8, u8 Signed/unsigned 8 byte integer
float16 f2
float32 f4, f
float64 f8, d
float128 f16, g
complex64, complex128, complex256
bool ?
object O
string_ S String data in NumPy is fixed size and may truncate input without warning.
unicode_ U

Casting an Array to a Different Data Type

Casting can be performed with the array method astype(). Calling astype() always creates a new array, and makes a copy of the data, even if the new data type is the same as the old data type. The conversion is NOT performed in place.

a = np.ones((1))
assert a.dtype == np.float64
b = a.astype(np.int64)

array([1])

If casting were to fail because the conversation cannot be done, a ValueError exception will be raised.

An array of strings represented numbers can be converted to numeric form with astype():

a = np.array(["5.3", "1.1", "10.3"])
b = a.astype(np.float64)
assert b.dtype == np.float64

array([ 5.3,  1.1, 10.3])

Array Indexing and Slicing

By indexing we mean selecting a subset of an array using the index operator [...] and a single numeric index. By slicing we mean selecting a subset of an array using the index operator [...] and slice expression A:B. Indices and slice expressions can be combined within the same index operator.

Unidimensional Array Slices

With unidimensional arrays, the slice operator selects elements similarly to the Python slice operator:

a = np.array([1, 2, 3, 4, 5])
b = a[1:4]

array([2, 3, 4])

However, unlike Python list slices, which are copies on the underlying list, NumPy ndarray slices are a view into the original array, providing direct access to the underlying array. The data is not copied, and any modification to the view will be reflected in the array. The reason lies in the fact that NumPy has been designed to be able to work with very large arrays, so avoiding copying data is part of this approach.

To make a copy of the underlying data, invoke the copy() method on the slice:

a = np.ones((3), np.int64)
b = a[0:2].copy()
b[0] = 10
assert a[0] == 1 # the underlying array has not been changed

Assigning a scalar value to a slice propagates (broadcasts) that value to the entire selection.

a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[1:4] = 2                 # array([1, 2, 2, 2, 1])

The bare slice [:] will assign to all values in an array:

a = np.ones((5), np.int64) # array([1, 1, 1, 1, 1])
a[:] = 2                   # array([2, 2, 2, 2, 2])

Multi-dimensional Array Indexing and Slicing

For multi-dimensional arrays, a slice returns a view into a restricted selection of the multi-dimensional array. The slice selects a range element along an axis. A colon by itself (:) means an entire axis. Each selected element of the slice is a smaller-dimension component.

In case of a two-dimensional array, the slice selection contains one-dimensional vectors corresponding to the slice indices:

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

It is useful to think about the following slice expression as a slice of rows ("select the first two rows of the array"):

a[:2]

  ┌───┬───┬───┐
0 │ 1 │ 2 │ 3 │     
  ├───┼───┼───┤
1 │ 4 │ 5 │ 6 │     
  ├───┼───┼───┤
  │   │   │   │     
  └───┴───┴───┘

array([[1, 2, 3],
       [4, 5, 6])

Note that a[:2] and a[0:2] are equivalent.

Individual elements can be accessed recursively:

assert a[0][1] == 2

An equivalent notation uses commas to separate indices:

assert a[0, 1] == 2

Multiple slices can be passed like you pass multiple indices: a[1:3, 2:].

In the following case, multiple rows and multiple columns are selected:

a[:2, 1:]

        1   2  
  ┌───┬───┬───┐
0 │   │ 2 │ 3 │     
  ├───┼───┼───┤
1 │   │ 5 │ 6 │     
  ├───┼───┼───┤
  │   │   │   │     
  └───┴───┴───┘
 
array([[2, 3],
       [5, 6])

The shape is (2, 2).

In this case, multiple (all) rows, but just one column is selected:

a[:, :1]

    0 
  ┌───┬───┬───┐
0 │ 1 │   │   │     
  ├───┼───┼───┤
1 │ 4 │   │   │     
  ├───┼───┼───┤
2 │ 7 │   │   │     
  └───┴───┴───┘
 
array([[1],
       [4],
       [7]])

The shape is (3, 1).

Interestingly, if we select multiple rows, but the second argument is an index, not a slice, we, don't get a "column", but a row. Why?

a[:, 0]

array([1, 4, 7])

Boolean Indexing

Boolean Indexing

Fancy Indexing

Fancy Indexing

Array Methods

copy()

Can be used with slices to make a copy of the underlying data, instead of offering direct access to the storage of the source array.

reshape()

np.arange(32).reshape((8, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

transpose()

See Transposing Arrays below.

swapaxes()

swapaxes() takes a pair of axis numbers and switches the indicated axes to rearrange the data. swapaxes() returns a view of the data without making a copy.

Array Arithmetic

Vectorization

Any arithmetic operation between equal-size arrays applies the operation element-wise:

a = np.full((2, 3), 2)
b = np.full((2, 3), 3)

a + b

array([[5, 5, 5],
       [5, 5, 5]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

a = np.full((2, 3), 2)

2 * a

array([[4, 4, 4],
       [4, 4, 4]])

Comparisons between arrays of the same shape yield Boolean arrays of the same shape.

Vectorized Comparison

Like arithmetic operations, comparisons (such as ==) is vectorized. Applying such a comparison on an array results in a boolean array:

a = np.array(['A', 'B', 'C', 'A', 'A' ,'D'])
a == 'A'

array([ True, False, False,  True,  True, False])

Additional variations of the comparison syntax:

a != 'A'
a = np.array([1, 2, 3, 4, 5])
a > 3

array([False, False, False,  True,  True])

Note that these boolean arrays can be used in boolean indexing:

Boolean Indexing

Transposing Arrays

Transposing is a special form of reshaping that returns a view of the underlying data, without copying anything.

Arrays have a transpose() method.

They also have a T attribute:

a = np.arange(32).reshape((8, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

a.T

array([[ 0,  4,  8, 12, 16, 20, 24, 28],
       [ 1,  5,  9, 13, 17, 21, 25, 29],
       [ 2,  6, 10, 14, 18, 22, 26, 30],
       [ 3,  7, 11, 15, 19, 23, 27, 31]])

What is the difference between T and transpose()?

Matrix Multiplication

A = ...
B = ...
np.dot(A, B)
A @ B

Swapping Axes

See swapaxes() above.

Universal Functions

Universal Functions

Array-Oriented Programming

Conditional Logic as Array Operations

Mathematical and Statistical Operations

Sorting

Linear Algebra

File Input/Output with Arrays