NumPy Boolean Array Indexing

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

Boolean indexing is when a boolean array is used to select element from another array with the same shape.

Boolean indexing for unidimensional arrays:

a = np.array(['A', 'B', 'C', 'D'])
b = np.array([True, False, False, True])
a[b]

array(['A', 'D'], dtype='<U1')

Boolean indexing for two-dimensional arrays:

a = np.array([['A', 'B', 'C', 'D'], ['E', 'F', 'G', 'H']])
b = np.array([[True, False, True, False], [False, True, False, True]])

array([['A', 'B', 'C', 'D'],
       ['E', 'F', 'G', 'H']], dtype='<U1')
array([[ True, False,  True, False],
       [False,  True, False,  True]])

a[b]

 array(['A', 'C', 'F', 'H'], dtype='<U1')

Why did a two dimensional array turn into one-dimensional array?

To invert an array used in boolean indexing:

a = np.array(['A', 'B', 'C', 'D'])
b = np.array([True, False, False, True])
a[~b]

array(['B', 'C'], dtype='<U1')

Boolean arrays can be combined with the & or | operators when indexing (note that and and or keywords do not work with boolean arrays):

a = np.array(['A', 'B', 'C', 'D'])
b = np.array([True, False, True, False])
b2 = np.array([False, False, False, True]) 
a[b | b2]

array(['A', 'C', 'D'], dtype='<U1')

Selecting data from an array by boolean indexing and assigning the result to a new variable always creates a copy of the data, even if the returned array is unchanged.

If the boolean array and the target array do not have the same shape, the operation produces an IndexError:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 6 but corresponding boolean dimension is 5

The boolean arrays used in boolean indexing can be generated with vectorized comparison.

Boolean arrays can be mixed with slices and indices when indexing (TODO).

Boolean Indexing and Assignment

Setting values with boolean arrays works by substituting the value or values on the righthand side into the locations where the boolean values are True:

a = np.array(['A', 'B', 'C', 'D'])
b = np.array([True, False, True, False])
a[b] = 'X'

array(['X', 'B', 'X', 'D'], dtype='<U1')

The same array can be used in the expression:

a = np.array(['A', 'B', 'C', 'D'])
a[a >= 'C'] = 'X'

array(['A', 'B', 'X', 'X'], dtype='<U1')