Skip to content

Commit

Permalink
Merge pull request #178 from asmeurer/docs-fixes
Browse files Browse the repository at this point in the history
Small fixes to the indexing guide
  • Loading branch information
asmeurer authored May 24, 2024
2 parents 0ba06d3 + 94100f0 commit a1379ac
Show file tree
Hide file tree
Showing 7 changed files with 136 additions and 86 deletions.
1 change: 1 addition & 0 deletions docs/indexing-guide/integer-indices.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ Therefore, negative indices are primarily a syntactic convenience that
allows one to specify parts of a list that would otherwise need to be
specified in terms of the size of the list.

(integer-indices-bounds-checking)=
If an integer index is greater than or equal to the size of the list, or less
than negative the size of the list (`i >= len(a)` or `i < -len(a)`), then it
is out of bounds and will raise an `IndexError`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -671,7 +671,7 @@ Or if it had no actual `0`s:[^0-d-mask-footnote]
with the shape `(0,)` to the shape `(0,)`, and so this is what gets
assigned, i.e., "nothing" (of shape `(0,)`) gets assigned to "nothing" (of
matching shape `(0,)`). This is one reason why [broadcasting
rules](broadcasting) apply even to dimensions of size 0.
rules](broadcasting) apply even to dimensions of size `0`.

```py
>>> a = np.asarray([1, 1, 2])
Expand Down
146 changes: 97 additions & 49 deletions docs/indexing-guide/multidimensional-indices/integer-arrays.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Integer Array Indices

```{note}
In this section, and [the next](boolean-arrays), do not confuse the *array
In this section and [the next](boolean-arrays), do not confuse the *array
being indexed* with the *array that is the index*. The former can be anything
and have any dtype. It is only the latter that is restricted to being integer
or boolean.
Expand Down Expand Up @@ -35,7 +35,7 @@ elements of the array in order (or possibly [reversed order](negative-steps)
for slices), whereas this array has elements completely shuffled from `a`, and
some are even repeated.

However, we could "cheat" a bit here, and do something like
However, we could "cheat" a bit here and do something like

```py
>>> new_array = np.array([[a[0], a[2], a[0]],
Expand All @@ -47,7 +47,7 @@ array([[100, 102, 100],

This is the array we want. We sort of constructed it using only indexing
operations, but we didn't actually do `a[idx]` for some index `idx`. Instead,
we just listed the index of each individual element.
we just listed the indices of each individual element.

An integer array index is essentially this "cheating" method, but as a single
index. Instead of listing out `a[0]`, `a[2]`, and so on, we just create a
Expand Down Expand Up @@ -78,25 +78,19 @@ Note that `a[idx]` above is not the same size as `a` at all. `a` has 4
elements and is 1-dimensional, whereas `a[idx]` has 6 elements and is
2-dimensional. `a[idx]` also contains some duplicate elements from `a`, and
there are some elements which aren't selected at all. Indeed, we could take
*any* integer array of any shape, and as long as the elements are between 0
and 3, `a[idx]` would create a new array with the same shape as `idx` with
corresponding elements selected from `a`.
*any* integer array `idx` of any shape, and as long as the elements are
between 0 and 3, `a[idx]` would create a new array with the same shape as
`idx` with corresponding elements selected from `a`.

A useful way to think about integer array indexing is that it generalizes
[integer indexing](../integer-indices.md). With integer indexing, we are
effectively indexing using a 0-dimensional integer array, that is, a single
integer.[^integer-scalar-footnote] This always selects the corresponding
element from the given axis and removes the dimension. That is, it replaces
that dimension in the shape with `()`, the "shape" of the integer index.

Similarly,
The shape of `a` is `(4,)` and the shape of `a[idx]` is `(2, 3)`, the same as the
shape of `idx`. In general:

> **an integer array index `a[idx]` selects elements from the specified axis
> and replaces the dimension in the shape with the shape of the index array
> `idx`.**
> and replaces the selected dimension in the shape of `a` with the shape of
> the index array `idx`.**

For example:
For example, in `a[idx].shape`, `4` is replaced with `(2, 3)`. Consider what
happens when `a` has more than one dimension:

```
>>> a = np.empty((3, 4))
Expand All @@ -107,13 +101,67 @@ For example:
(3, 2, 2)
```

In particular, even when the index array `idx` has more than one dimension, an
Here `a.shape` is `(3, 4)` and `idx.shape` is `(2, 2)`. In `a[idx].shape`, the
`3` is replaced with `(2, 2)`, giving `(2, 2, 4)`, and in `a[:, idx].shape`,
the `4` is replaced with `(2, 2)`, giving `(3, 2, 2)`.

A useful way to think about integer array indexing is that it generalizes
[integer indexing](../integer-indices.md). With integer indexing, we are
effectively indexing using a 0-dimensional integer array, that is, a single
integer. This always selects the corresponding element from the given axis and
removes the dimension. That is, it replaces that dimension in the shape with
`()` (i.e., nothing), the "shape" of the integer index. The result of indexing
with an `int` and a corresponding 0-D array is exactly the
same.[^integer-scalar-footnote]

[^integer-scalar-footnote]:
<!-- This is the only way to cross reference a footnote across documents -->
(integer-scalar-footnote-ref)=

There is one difference between `a[0]` and `a[asarray(0)]`. The
latter is considered an advanced index, so it does not create a
[view](views-vs-copies):

```py
>>> a = np.empty((2, 3))
>>> a[0].base is a
True
>>> print(a[np.array(0)].base)
None
```

In ndindex,
[`IntegerArray.reduce()`](ndindex.IntegerArray.reduce) will always convert
a 0-D array index into an [`Integer`](ndindex.integer.Integer).

```py
>>> idx = np.asarray(0) # 0-D array
>>> idx.shape
()
>>> a = np.arange(12).reshape((3, 4))
>>> a[idx].shape # replaces (3,) with ()
(4,)
>>> a[:, idx].shape # replaces (4,) with ()
(3,)
>>> a[idx] # a[asarray(0)] is the exact same as a[0]
array([0, 1, 2, 3])
>>> a[0]
array([0, 1, 2, 3])
```

Note that even when the index array `idx` has more than one dimension, an
integer array index still only selects elements from a single axis of `a`. It
would appear that this limits the ability to arbitrarily shuffle elements of
`a` using integer indexing. For instance, suppose we want to create the array
`[105, 100]` from the above 2-D `a`. Based on the above examples, it might not
seem possible, since the elements `105` and `100` are not in the same row or
column of `a`.
`a` using integer indexing. For instance, suppose we have the 2-D array

```
>>> a = [[100, 101, 102],
... [103, 104, 105]]
```

and we wanted use indexing to create the array `[105, 100]`. Based on the
above examples, this might not seem possible, since the elements `105` and
`100` are not in the same row or column of `a`.

However, this is doable by providing multiple integer array
indices:
Expand Down Expand Up @@ -285,6 +333,28 @@ array([100, 101, 103])
array([100, 101, 103])
```

### Bounds Checking

As with [integer indices](../integer-indices.md), integer array indexing uses
bounds checking, with the [same rule as integer
indices](integer-indices-bounds-checking).

> **If any entry in an integer array index is greater than `size - 1` or less
> than `-size`, where `size` is the size of the dimension being indexed, an
> `IndexError` is raised.**
```py
>>> a = np.array([100, 101, 102, 103]) # as above
>>> a[[2, 3, 4]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 4 is out of bounds for axis 0 with size 4
>>> a[[-5, -4, -3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index -5 is out of bounds for axis 0 with size 4
```

(integer-array-broadcasting)=
### Broadcasting

Expand Down Expand Up @@ -360,28 +430,6 @@ The ndindex methods
[`expand()`](ndindex.Tuple.expand) will broadcast array indices together into
a canonical form.

[^integer-scalar-footnote]:
<!-- This is the only way to cross reference a footnote across documents -->
(integer-scalar-footnote-ref)=

In fact, if the integer array index itself has
shape `()`, then the behavior is identical to simply using an `int` with
the same value. So it's a true generalization. In ndindex,
[`IntegerArray.reduce()`](ndindex.IntegerArray.reduce) will always convert
a 0-D array index into an [`Integer`](ndindex.integer.Integer).

However, there is one difference between `a[0]` and `a[asarray(0)]`. The
latter is considered an advanced index, so it does not create a
[view](views-vs-copies):

```py
>>> a = np.empty((2, 3))
>>> a[0].base is a
True
>>> print(a[np.array(0)].base)
None
```

(outer-indexing)=
#### Outer Indexing

Expand Down Expand Up @@ -474,8 +522,8 @@ axis, i.e., exactly the arrays we want.

This is why NumPy automatically broadcasts integer array indices together.

> **Outer indexing arrays can be constructed by inserting size-1 dimensions
> into the desired "outer" integer array indices so that the non-size-1
> **Outer indexing arrays can be constructed by inserting size `1` dimensions
> into the desired "outer" integer array indices so that the non-size `1`
> dimension for each is in the indexing dimension.**
For example,
Expand All @@ -492,7 +540,7 @@ Here, we use [newaxis](newaxis.md) along with `:` to turn `idx0` and
`idx1` into shape `(2, 1)` and `(1, 3)` arrays, respectively. These then
automatically broadcast together to give the desired outer index.

This "insert size-1 dimensions" operation can also be performed automatically
This "insert size `1` dimensions" operation can also be performed automatically
with the {external+numpy:func}`numpy.ix_` function.[^ix-footnote]

[^ix-footnote]: `ix_()` is currently limited to only support 1-D input arrays
Expand Down Expand Up @@ -599,7 +647,7 @@ For example, consider:
... [103, 104, 105]]])
```

This is the same `a` as in the above examples, except it has an extra size-1
This is the same `a` as in the above examples, except it has an extra size `1`
dimension:

```py
Expand Down
34 changes: 17 additions & 17 deletions docs/indexing-guide/multidimensional-indices/newaxis.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ None
True
```

`newaxis`, as the name suggests, adds a new axis. This new axis has size `1`.
The new axis is added at the corresponding location within the array. A size
`1` axis neither adds nor removes any elements from the array. Using the
[nested lists analogy](what-is-an-array.md), it essentially adds a new "layer"
to the list of lists.
`newaxis`, as the name suggests, adds a new axis to an array. This new axis
has size `1`. The new axis is added at the corresponding location within the
array shape. A size `1` axis neither adds nor removes any elements from the
array. Using the [nested lists analogy](what-is-an-array.md), it essentially
adds a new "layer" to the list of lists.


```py
Expand Down Expand Up @@ -77,7 +77,7 @@ within the index `a[0, :2]`:

In each case, the exact same elements are selected: `0` always targets the
first axis, and `:2` always targets the second axis. The only difference is
where the size-1 axis is inserted:
where the size `1` axis is inserted:

```py
>>> a[np.newaxis, 0, :2]
Expand Down Expand Up @@ -127,11 +127,11 @@ its position in the tuple index after removing any `newaxis` indices.
Equivalently, `newaxis` indices can be though of as adding new axes *after*
the existing axes are indexed.

A size-1 axis can always be inserted anywhere in an array's shape without
A size `1` axis can always be inserted anywhere in an array's shape without
changing the underlying elements.

An array index can include multiple instances of `newaxis` (or `None`). Each
will add a size-1 axis in the corresponding location.
will add a size `1` axis in the corresponding location.

**Exercise:** Can you determine the shape of this array, given that `a.shape`
is `(3, 2, 4)`?
Expand All @@ -151,7 +151,7 @@ a[np.newaxis, 0, newaxis, :2, newaxis, ..., newaxis]

In summary,

> **`np.newaxis` (which is just an alias for `None`) inserts a new size-1 axis
> **`np.newaxis` (which is just an alias for `None`) inserts a new size `1` axis
in the corresponding location in the tuple index. The remaining,
non-`newaxis` indices in the tuple index are indexed as if the `newaxis`
indices were not there.**
Expand Down Expand Up @@ -184,22 +184,22 @@ array([[ 0],
`(3, 1)` column vector.

But the most common usage is due to [broadcasting](broadcasting). The key idea
of broadcasting is that size-1 dimensions are not directly useful, in the
of broadcasting is that size `1` dimensions are not directly useful, in the
sense that they could be removed without actually changing anything about the
underlying data in the array. So they are used as a signal that that dimension
can be repeated in operations. `newaxis` is therefore useful for inserting
these size-1 dimensions in situations where you want to force your data to be
repeated. For example, suppose we have the two arrays
these size `1` dimensions in situations where you want to force your data to
be repeated. For example, suppose we have the two arrays

```py
>>> x = np.array([1, 2, 3])
>>> y = np.array([100, 200])
```

and suppose we want to compute an "outer" sum of `x` and `y`, that is, we want
to compute every combination of `i + j` where `i` is from `x` and `j` is from
to compute every combination of `a + b` where `a` is from `x` and `b` is from
`y`. The key realization here is that what we want is simply to
repeat each entry of `x` 3 times, to correspond to each entry of `y`, and
repeat each entry of `x` 2 times, to correspond to each entry of `y`, and
respectively repeat each entry of `y` 3 times, to correspond to each entry of
`x`. And this is exactly the sort of thing broadcasting does! We only need to
make the shapes of `x` and `y` match in such a way that the broadcasting will
Expand All @@ -217,7 +217,7 @@ from `x`, and the second dimension will correspond to values from `y`, i.e.,
`a[i, j]` will be `x[i] + y[j]`. Thus the resulting array will have shape `(3,
2)`. So to make `x` (which is shape `(3,)`) and `y` (which is shape `(2,)`)
broadcast to this, we need to make them `(3, 1)` and `(1, 2)`, respectively.
This can easily be done with `np.newaxis`.
This can easily be done with `np.newaxis`:

```py
>>> x[:, np.newaxis].shape
Expand Down Expand Up @@ -245,7 +245,7 @@ array([[101, 201],
[103, 203]])
```

Note: broadcasting automatically prepends shape `1` dimensions, so the
Note: broadcasting automatically prepends size `1` dimensions, so the
`y[np.newaxis, :]` operation is unnecessary.

```py
Expand All @@ -255,7 +255,7 @@ array([[101, 201],
[103, 203]])
```

As we saw [before](single-axis-tuple), size-1 dimensions may seem redundant,
As we saw [before](single-axis-tuple), size `1` dimensions may seem redundant,
but they are not a bad thing. Not only do they allow indexing an array
uniformly, they are also very important in the way they interact with NumPy's
broadcasting rules.
Expand Down
6 changes: 3 additions & 3 deletions docs/indexing-guide/multidimensional-indices/tuples.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ array([[[16, 17, 18, 19],
```

We also observe that integer indices remove the axis, and slices keep the axis
(even when the resulting axis has size-1):
(even when the resulting axis has size 1):

```py
>>> a[0].shape
Expand Down Expand Up @@ -344,14 +344,14 @@ because it means that you can index the array
uniformly.[^size-1-dimension-footnote] And this doesn't apply just to
indexing. Many NumPy functions reduce the number of dimensions of their output
(for example, {external+numpy:func}`numpy.sum`), but they have a `keepdims`
argument to retain the dimension as a size-1 dimension instead.
argument to retain the dimension as a size `1` dimension instead.

[^size-1-dimension-footnote]: In this example, if we knew that we were always
going to select exactly one element (say, the second one) from the first
dimension, we could equivalently use `a[1, np.newaxis]` (see
[](../integer-indices.md) and [](newaxis.md)). The advantage of this is
that we would get an error if the first dimension of `a` didn't actually
have `2` elements, whereas `a[1:2]` would just silently give a size-0
have `2` elements, whereas `a[1:2]` would just silently give a size `0`
array.

There are two final facts about tuple indices that should be noted before we
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ subsets of it.
```

You can imagine all sorts of different things you'd want to do with your
scores that might involve selecting individual scores or ranges of scores (for
scores that might involve selecting individual scores or ranges of scores. For
example, with the above examples, we could easily compute the average score of
our last three games, and see how it compares to our first game). So hopefully
our last three games, and see how it compares to our first game. So hopefully
you are convinced that at least the types of indices we have learned so far
are useful.

Expand Down
Loading

0 comments on commit a1379ac

Please sign in to comment.