Skip to content

Commit

Permalink
Various grammar improvements suggested by ChatGPT and Claude
Browse files Browse the repository at this point in the history
  • Loading branch information
asmeurer committed Apr 10, 2024
1 parent 318c933 commit 98d81ef
Show file tree
Hide file tree
Showing 5 changed files with 67 additions and 66 deletions.
24 changes: 13 additions & 11 deletions docs/indexing-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
This section of the ndindex documentation discusses the semantics of NumPy
indices. This really is more of a documentation of NumPy itself than of
ndindex. However, understanding the underlying semantics of indices is
critical making the best use of ndindex, as well as for making the best use of
NumPy arrays themselves. Furthermore, the sections on [integer
critical to making the best use of ndindex, as well as for making the best use
of NumPy arrays themselves. Furthermore, the sections on [integer
indices](integer-indices) and [slices](slices-docs) also apply to the built-in
Python sequence types like `list` and `str`.

Expand All @@ -15,7 +15,7 @@ beginner.


(what-is-an-index)=
## What is an index?
## What is an Index?

Nominally, an index is any object that can go between the square brackets
after an array. That is, if `a` is a NumPy array, then in `a[x]`, *`x`* is an
Expand All @@ -36,10 +36,11 @@ semantics outlined here.
type](slices-docs). The term "index" is used in the Python language itself
(e.g., in the built-in exception type `IndexError`).

Semantically, an index `x` picks, or *indexes*[^indexes-footnote], some subset of the elements of `a`. An index
`a[x]` always either returns a new array with some subset of the elements of
`a`, or it raises `IndexError`. The most important rule for indexing, which
applies to all types of indices, is this:
Semantically, an index `x` selects, or *indexes*[^indexes-footnote], some
subset of the elements of `a`. An index `a[x]` always either returns a new
array with some subset of the elements of `a`, or it raises `IndexError`. The
most important rule for indexing, which applies to all types of indices, is
this:

[^indexes-footnote]: For clarity, in this document, and throughout the ndindex
documentation, the plural of *index* is *indices*. *Indexes* is always a
Expand Down Expand Up @@ -109,7 +110,7 @@ So the following are always true about any index:
produce an array with the exact same resulting shape with elements in the
exact same corresponding places.

To be sure, it is possible to *construct* indices that chose specific elements
To be sure, it is possible to *construct* indices that choose specific elements
based on their values. A common example of this is masks (i.e., [boolean array
indices](boolean-array-indices)), such as `a[a > 0]`, which selects all the
elements of `a` that are greater than zero. However, the resulting index
Expand All @@ -123,7 +124,7 @@ commonly desired indexing operations are represented by the basic indices such
as [integer indices](integer-indices), [slices](slices-docs), and
[ellipses](ellipsis-indices).

## Sections in this Guide
## Overview of this Guide

This guide is split into four sections.

Expand All @@ -148,13 +149,14 @@ a set of miscellaneous topics about NumPy arrays that are useful for
understanding how indexing works, such as broadcasting, views, strides, and
ordering.

## Footnotes
## Table of Contents

```{toctree}
:titlesonly:
:hidden:
integer-indices.md
slices.md
multidimensional-indices.md
other-topics.md
```
## Footnotes
66 changes: 32 additions & 34 deletions docs/indexing-guide/integer-indices.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Integer Indices

The simplest possible index type is an integer index, that is `a[i]` where `i`
The simplest possible index type is an integer index, that is, `a[i]` where `i`
is an integer like `0`, `3`, or `-2`.

Integer indexing operates on the familiar Python data types `list`, `tuple`,
and `str`, as well as NumPy arrays.

(prototype-example)=
Let's use as an example this prototype list:
Let us consider the following prototype list as an example:

<!-- TODO: Differentiate without color -->
<div class="slice-diagram">
Expand All @@ -29,26 +29,21 @@ Let's use as an example this prototype list:
The list `a` has 7 elements.

The elements of `a` are strings, but the indices and slices on the list `a`
will always use integers. Like [all other index types](what-is-an-index),
**the result of an integer index is never based on the values of the elements,
but rather on their positions in the list.**[^dict-footnote]
will always use integers. As with [all other index types](what-is-an-index),
**the result of an integer index is never based on the values of the elements;
it is based instead on their positions in the list.**[^dict-footnote]

[^dict-footnote]: If you are looking for something that allows non-integer
indices or that indexes by value, you may want a `dict`.

An integer index picks a single element from the list `a`.
An integer index selects a single element from the list `a`.

> **The key thing to remember about indexing in Python, both for integer and
slice indexing, is that it is 0-based.**

(fourth-sentence)=
This means that the indices start
counting at 0 (like "0, 1, 2, ..."). This is the case for all *nonnegative*
indices[^nonnegative]. For example, `a[3]` would pick the *fourth* element of
`a`, in this case, `'d'`:

[^nonnegative]: In this guide, "*nonnegative*" means $\geq 0$ and
"*negative*" means $< 0$.
This means that indices start at 0 ("0, 1, 2, ..."). For example,
`a[3]` selects the *fourth* element of `a`, in this case, `'d'`:

<div class="slice-diagram">
<code style="font-size: 16pt;">a[3] == 'd'</code>
Expand Down Expand Up @@ -92,7 +87,7 @@ programmer, especially if you are planning to work with arrays.
For *negative* integers, indices index from the end of the list. These indices
are necessarily 1-based (or rather, &minus;1-based), since `0` already refers
to the first element of the list. `-1` chooses the last element, `-2` the
second-to-last, and so on. For example, `a[-3]` picks the *third-to-last*
second-to-last, and so on. For example, `a[-3]` selects the *third-to-last*
element of `a`, in this case, `'e'`:


Expand Down Expand Up @@ -130,7 +125,7 @@ element of `a`, in this case, `'e'`:
```

An equivalent way to think about negative indices is that an index
`a[-i]` picks `a[len(a) - i]`, that is, you can subtract the negative
`a[-i]` selects `a[len(a) - i]`, that is, you can subtract the negative
index off of the size of `a` (for a NumPy array, replace `len(a)`
with the size of the axis being sliced). For example, `len(a)` is `7`, so
`a[-3]` is the same as `a[7 - 3]`:
Expand Down Expand Up @@ -161,11 +156,11 @@ Traceback (most recent call last):
IndexError: list index out of range
```

For NumPy arrays, this applies to the size of the axis being indexed (not the
total size of the array):
For NumPy arrays, `i` is bounded by the size of the axis being indexed (not
the total size of the array):


```
```py
>>> import numpy as np
>>> a = np.ones((2, 3)) # A has 6 elements but the first axis is size 2
>>> a[2]
Expand All @@ -178,8 +173,8 @@ Traceback (most recent call last):
IndexError: index -3 is out of bounds for axis 0 with size 2
```

Fortunately, NumPy arrays give more helpful error messages for `IndexError`
than Python does for `list`.
Fortunately, NumPy arrays give more helpful `IndexError` error messages than
Python lists do.

The second important fact about integer indexing is that it reduces the
dimensionality of the container being indexed. For a `list` or `tuple`, this
Expand All @@ -200,12 +195,15 @@ Python. A single character is just represented as a string of length 1.
<class 'str'>
```

<!-- TODO: Expand on the behavior for multidimensional arrays -->

For NumPy arrays, an integer index always indexes a single axis of the array.
By default, it indexes the first axis, unless it is part of a larger
[multidimensional index](multidimensional-indices). The resulting array is always an
array with the dimensionality reduced by 1, namely, the axis being indexed is
removed from the resulting shape. This is contrast with [slices](slices-docs), which always
[maintain the dimension being sliced](subarray).
[multidimensional index](multidimensional-indices). The resulting array is
always an array with the dimensionality reduced by 1, namely, the axis being
indexed is removed from the resulting shape. This is in contrast with
[slices](slices-docs), which always [maintain the dimension being
sliced](subarray).

```py
>>> a = np.ones((2, 3, 4))
Expand All @@ -222,19 +220,19 @@ removed from the resulting shape. This is contrast with [slices](slices-docs), w
The resulting array is a subarray corresponding to the `i`-th position along
the given axis, using the 0- and &minus;1-based rules discussed above.

One way to think about integer indexing on a NumPy array is to think about
[the list-of-lists analogy](what-is-an-array). An integer index on the first
axis `a[i]` picks the index `i` sub-list at the top level of sub-list nesting,
and in general, an integer index `i` on axis `k` picks the sub-lists of index
`i` at the `k`-th nesting level.[^nesting-level] For example, if `l` is a
nested list of lists
A helpful analogy for understanding integer indexing on NumPy arrays is to
consider it in terms of a [list of lists](what-is-an-array). An integer index
on the first axis `a[i]` selects the `i`-th sub-list at the top level of
sub-list nesting. And in general, an integer index `i` on axis `k` selects the
`i`-th sub-lists at the `k`-th nesting level.[^nesting-level] For
example, if `l` is a nested list of lists

[^nesting-level]: Thinking about the `k`-th level of nesting can get
confusing. For instance, it's not clear whether `k` should be counted with
0-based or 1-based numbering, and which level counts as which considering
that at the outermost "level" there is always a single list. List-of-lists
confusing. For instance, it is unclear whether `k` should be counted with
0-based or 1-based numbering, or which level counts as which, considering
that at the outermost "level," there is always a single list. List-of-lists
is a good analogy for thinking about why one might want to use an nd-array
in the first place, but as you actually use NumPy arrays in practice,
in the first place. But as you actually use NumPy arrays in practice,
you'll find it's much better to think about dimensions and axes directly,
not "levels of nesting".

Expand Down
8 changes: 4 additions & 4 deletions docs/indexing-guide/multidimensional-indices.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,7 @@ array([[ 8, 9, 10, 11],
[12, 13, 14, 15]])
```

You might have noticed something about this. It is picking the second element
You might have noticed something about this. It is selecting the second element
of the first axis. But from what we said earlier, we can also do this just by
using the basic index `1`, which will operate on the first axis:

Expand Down Expand Up @@ -428,7 +428,7 @@ array([[ 0, 4],

`:` serves as a convenient way to "skip" axes. It is one of the most common
types of indices that you will see in practice for this reason. However, it is
important to remember that `:` is not special. It is just a slice, which picks
important to remember that `:` is not special. It is just a slice, which selects
every element of the corresponding axis. We could also replace `:` with `0:n`,
where `n` is the size of the corresponding axis (see the [slices
documentation](omitted)).
Expand Down Expand Up @@ -491,7 +491,7 @@ Here `b = a[:2]` has shape `(2, 2, 4)`
(2, 2, 4)
```

But suppose instead we used a slice that only picked one element from the
But suppose instead we used a slice that only selected one element from the
first axis

```py
Expand Down Expand Up @@ -1682,7 +1682,7 @@ array([-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10])
```

Say we want to pick the elements of `a` that are both positive and odd. The
Say we want to select the elements of `a` that are both positive and odd. The
boolean mask `a > 0` represents which elements are positive and the boolean
mask `a % 2 == 1` represents which elements are odd. So our mask would be

Expand Down
11 changes: 6 additions & 5 deletions docs/indexing-guide/other-topics.md
Original file line number Diff line number Diff line change
Expand Up @@ -593,11 +593,12 @@ True
```

Operating on memory that is contiguous allows the CPU to place the entire
memory in the cache at once, and as a result is more performant. This won't be
visible for our example `a` above, which is small enough to fix in cache
entirely, but matters for larger arrays. Compare the time to sum along `a[0]`
or `a[..., 0]` for C and Fortran ordered arrays for a 3-dimensional array with
a million elements (using [IPython](https://ipython.org/)'s `%timeit`):
memory in the cache at once, and as a result is more performant. The
performance difference won't be noticeable for our small example `a` above,
which is small enough to fix in cache entirely, but it matters for larger
arrays. Compare the time to sum along `a[0]` or `a[..., 0]` for C and Fortran
ordered arrays for a 3-dimensional array with a million elements (using
[IPython](https://ipython.org/)'s `%timeit`):

```
In [1]: import numpy as np
Expand Down
24 changes: 12 additions & 12 deletions docs/indexing-guide/slices.md
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ omitted `start`/`stop`](omitted)).
(wrong-rule-1)=
##### Wrong Rule 1: "A slice `a[start:stop]` slices the half-open interval $[\text{start}, \text{stop})$."

(or equivalently, "a slice `a[start:stop]` picks the elements $i$ such that
(or equivalently, "a slice `a[start:stop]` selects the elements $i$ such that
$\text{start} <= i < \text{stop}$")

This is *only* the case if the `step` is positive. It also isn't directly true
Expand Down Expand Up @@ -682,10 +682,10 @@ Rather than thinking about that, consider the spaces between the elements:
</div>


Using this way of thinking, the first element of `a` is to the left of
the "1-divider". An integer index `i` produces the element to the right of the
"`i`-divider", and a slice `a[i:j]` picks the elements between the `i` and `j`
dividers.
Using this way of thinking, the first element of `a` is to the left of the
"1-divider". An integer index `i` produces the element to the right of the
"`i`-divider", and a slice `a[i:j]` selects the elements between the `i` and
`j` dividers.

At first glance, this seems like a rather clever way to think about the
half-open rule. For instance, between the `3` and `5` dividers is the subarray
Expand Down Expand Up @@ -1341,10 +1341,10 @@ The `stop` slice value is out of bounds for `a`, but this just causes it to

But `start` contains a subtraction, which causes it to become negative. Rather
than clipping to the start, it wraps around and indexes from the end of `a`,
producing the slice `a[-1:7]`. This picks the elements from the last element
(`'g'`) up to but not including the 7th element (0-based). Index `7` is out of
bounds for `a`, so this picks all elements including and after `'g'`, which in
this case is just `['g']`.
producing the slice `a[-1:7]`. This selects the elements from the last
element (`'g'`) up to but not including the 7th element (0-based). Index `7`
is out of bounds for `a`, so this selects all elements including and after
`'g'`, which in this case is just `['g']`.

Unfortunately, the "correct" fix here depends on the desired behavior for each
individual slice. In some cases, the "slice from the end" behavior of negative
Expand Down Expand Up @@ -1521,7 +1521,7 @@ already built-in to the `==` comparison.
This trick works especially well when working with strings. Unlike with lists,
both [integer ](integer-indices) and slice indices on a string result in
another string, so changing the code logic to work in this way often only
requires adding a `:` to the index so that it is a slice that picks a single
requires adding a `:` to the index so that it is a slice that selects a single
element instead of an integer index. For example, take a function like

```py
Expand Down Expand Up @@ -1558,7 +1558,7 @@ If a third integer is provided in a slice, like `i:j:k`, this third integer is
the step size. If it is not provided, the step size defaults to `1`.

Thus far, we have only considered slices with the default step size of 1. When
the step is greater than 1, the slice picks every `step` element contained in
the step is greater than 1, the slice selects every `step` element contained in
the bounds of `start` and `stop`.

> **The proper way to think about `step` is that the slice starts at `start`
Expand Down Expand Up @@ -2354,7 +2354,7 @@ changes I would make to improve the semantics would be
Every human being is taught from an early age to count from 1. If you show
someone the list "a, b, c", they will tell you that "a" is the 1st, "b" is
the 2nd, and "c" is the 3rd. [Sentences](fourth-sentence) in this guide
like "`a[3]` would pick the fourth element of `a`" sound very off, even for
like "`a[3]` selects the fourth element of `a`" sound very off, even for
those of us used to 0-based indexing. 0-based indexing requires a shift in
thinking from the way that you have been taught to count from early
childhood. Counting is a very fundamental thing for any human, but
Expand Down

0 comments on commit 98d81ef

Please sign in to comment.