|
1 | 1 | \hsection{Text Strings}%
|
2 | 2 | \label{sec:str}%
|
3 | 3 | %
|
4 |
| -The fourth important datatype in \python\ are text strings. |
| 4 | +The fourth and last important basic datatype in \python\ are text strings. |
5 | 5 | Text strings are sequences of characters of an arbitrary length.
|
6 | 6 | In \python, they are represented by the datatype \pythonilIdx{str}.
|
7 | 7 | Indeed, we have already used it before, even in our very first example program back that simply printed \pythonil{"Hello World"} in \cref{lst:very_first_program} in \cref{sec:ourFirstProgram}.
|
|
16 | 16 | As \cref{exec:str_indexing} shows, there are two basic ways to specify a text string literal\pythonIdx{str!literal}:
|
17 | 17 | Either enclosed by double quotes, e.g., \pythonil{"Hello World!"}\pythonIdx{\textquotedbl\idxdots\textquotedbl} or enclosed by single quotes, e.g., \pythonil{'Hello World!'}\pythonIdx{\textquotesingle\idxdots\textquotesingle}.
|
18 | 18 | The quotation marks are only used to delimit the strings, i.e., to tell \python\ where the string begins or ends.
|
19 |
| -They are not themselves part of the string. |
20 |
| - |
21 |
| -\bestPractice{strDoubleQuote}{When defining a string literal, the double-quotation mark variant~(\pythonil{"..."}\pythonIdx{\textquotedbl\idxdots\textquotedbl}) may be preferred over the single-quotation mark variant~(\pythonil{'...'}\pythonIdx{\textquotesingle}); see also~\cref{bp:longstrDoubleQuote}.} |
22 |
| - |
| 19 | +They are not themselves part of the string.% |
| 20 | +% |
| 21 | +\bestPractice{strDoubleQuote}{When defining a string literal, the double-quotation mark variant~(\pythonil{"..."}\pythonIdx{\textquotedbl\idxdots\textquotedbl}) may be preferred over the single-quotation mark variant~(\pythonil{'...'}\pythonIdx{\textquotesingle}).~(The \citetitle{PEP8}~\cite{PEP8} does not give a recommendation, but maybe for consistency with the~\citetitle{PEP257}~\cite{PEP257}, see also~\cref{bp:longstrDoubleQuote}.)}% |
| 22 | +% |
23 | 23 | One basic operation is string concatenation\pythonIdx{str!concatenation}\pythonIdx{str!+}\pythonIdx{+}:
|
24 | 24 | \pythonil{"Hello" + ' ' + "World"}\pythonIdx{\textquotedbl\idxdots\textquotedbl}\pythonIdx{\textquotesingle\idxdots\textquotesingle} concatenates the three strings \pythonil{"Hello"}, \pythonil{" "}, and \pythonil{"World"}.
|
25 | 25 | The result is \pythonil{"Hello World"}\pythonIdx{\textquotedbl\idxdots\textquotedbl}.
|
|
68 | 68 | Finally, we can also omit the start index, in which case everything until right before the end index is returned.
|
69 | 69 | Therefore, \pythonil{"Hello"[:-2]} will return everything from the beginning of the string until right before the second-to-last character.
|
70 | 70 | This gives us \pythonil{"Hel"}.
|
| 71 | +The slice~\pythonil{[1:8:2]} returns the substring starting at index~1 and ending before index~8, containing every second character. |
| 72 | +Applied to~\pythonil{"Hello World!"} it therefore yields~\pythonil{"el o"}. |
71 | 73 | We will discussing slicing again later when discussing lists in~\cref{sec:lists}.
|
72 | 74 |
|
73 | 75 | \gitEvalPython{str_basic_ops}{}{simple_datatypes/str_basic_ops.py}%
|
|
87 | 89 | It returns \pythonil{6}, because the \inQuotes{W} of \inQuotes{World} is the seventh character in this string and the indices are zero-based.
|
88 | 90 | Trying to find the \pythonil{"world"} in \pythonil{"Hello World!"} yields~\pythonil{-1}, however.
|
89 | 91 | \pythonil{-1} means that the string cannot be found.
|
| 92 | + |
90 | 93 | We learn that string operations are case-sensitive\pythonIdx{str!case-sensitive}:
|
91 |
| -\pythonil{"World" != "world"} would be \pythonilIdx{True}. |
| 94 | +The uppercase character~\inQuotes{W} is different from the lowercase character~\pythonil{w}. |
| 95 | +Therefore, \pythonil{"World" != "world"} is~\pythonilIdx{True}. |
| 96 | +Therefore, \pythonil{"world"} cannot be found in \pythonil{"Hello World!"}. |
92 | 97 | We also learn that we need to be careful not to use the result of \pythonilIdx{find} as index in a string directly before checking that it is \pythonil{>= 0}!
|
93 | 98 | As you have learned, \pythonil{-1} is a perfectly fine index into a string, even though it means that the string we tried to find was not found.
|
94 | 99 |
|
|
106 | 111 | If we want to search from the end of the string, we use \pythonilIdx{rfind}.
|
107 | 112 | \pythonil{"Hello World!".rfind("l")} gives us~\pythonil{9} directly.
|
108 | 113 | If we want to search for the~\inQuotes{l} before that one, we need to supply an inclusive starting and exclusive ending index of the range to be searched.
|
109 |
| -\pythonil{"Hello World!".rfind("l", 0, 9)} searches for any~\inQuotes{l} from index~8 down to~0 and thus returns~\pythonil{3}. |
| 114 | +\pythonil{"Hello World!".rfind("l", 2, 9)} searches for any~\inQuotes{l} from index~8 down to~2 and thus returns~\pythonil{3}. |
110 | 115 | \pythonil{"Hello World!".rfind("l", 0, 3)} gives us~\pythonil{2} and since there is no~\inQuotes{l} before that, \pythonil{"Hello World!".rfind("l", 0, 2)} yields~\pythonil{-1}.
|
111 | 116 | \end{sloppypar}%
|
112 | 117 | %
|
113 | 118 | \begin{sloppypar}%
|
114 | 119 | Another common operation is to replace substrings with something else.
|
115 | 120 | \pythonil{"Hello World!".replace("Hello", "Hi")}\pythonIdx{replace} replaces all occurrences of \inQuotes{"Hello"} in \inQuotes{Hello World} with \inQuotes{Hi}.
|
116 |
| -The result is \pythonil{"Hi World!"} and \pythonil{"Hello Hello World!".replace("Hello", "Hi")} becomes \pythonil{"Hi Hi World!"}. |
| 121 | +The result is \pythonil{"Hi World!"} and \pythonil{"Hello World! Hello!".replace("Hello", "Hi")} becomes \pythonil{"Hi World! Hi!"}. |
| 122 | +It does not replace strings recursively, though. |
| 123 | +If you try to do \pythonil{"Hello World!".replace("Hello", "Hello! Hello!")}, then the \pythonil{"Hello"} is indeed replaced with \pythonil{"Hello! Hello!"}. |
| 124 | +This means that the new string now contains \pythonil{"Hello"} twice. |
| 125 | +These new occurrences are \emph{not} replaced, so the result remains as \pythonil{"Hello! Hello! World!"}.% |
117 | 126 | \end{sloppypar}%
|
118 | 127 | %
|
119 | 128 | \begin{sloppypar}%
|
120 |
| -Often, we want to remove all leading or trailing whitespace characters from a string. |
| 129 | +Often, we want to remove all leading or trailing whitespace characters~(spaces, newlines, tabs, \dots) from a string. |
121 | 130 | The \pythonilIdx{strip} function does this for us:
|
122 | 131 | \pythonil{" Hello World! ".strip()} returns \pythonil{"Hello World!".strip()}, i.e., the same string, but with the leading and trailing space removed.
|
123 | 132 | If we only want to remove the spaces on the left-hand side, we use \pythonilIdx{lstrip} and if we only want to remove those on the right-hand side, we use \pythonilIdx{rstrip} instead.
|
|
134 | 143 |
|
135 | 144 | Of course, these were just a small selection of the many string operations available in \python.
|
136 | 145 | You can find more in the \href{https://docs.python.org/3/library/stdtypes.html\#textseq}{official documentation}~\cite{PSF:P3D:TPSL:TSTS}.%
|
| 146 | +\FloatBarrier% |
137 | 147 | \endhsection%
|
138 | 148 | %
|
139 | 149 | \hsection{The str Function and f-strings}%
|
|
273 | 283 | For example, you could write \pythonil{f"\{23\ *\ sin(2\ -\ 5)\ =\ :.2f\}"} and then the \pythonil{.2f} format would be applied to the result of the expression, i.e., you would get \pythonil{"23 * sin(2 - 5) = -3.25"} as the result of the extrapolation.
|
274 | 284 |
|
275 | 285 | You are now able to convert the results of your computations to nice text.%
|
| 286 | +\FloatBarrier% |
276 | 287 | \endhsection%
|
277 | 288 | %
|
278 | 289 | \hsection{Converting Strings to other Datatypes}%
|
|
298 | 309 | Finally, the function \pythonilIdx{bool}\pythonIdx{bool!function} converts the strings \pythonil{"True"} and \pythonil{"False"} to \pythonilIdx{True} and \pythonilIdx{False}, respectively.
|
299 | 310 | With this, you are also able to convert strings to data that you can use as input for your computations.%
|
300 | 311 | %
|
| 312 | +\FloatBarrier% |
301 | 313 | \endhsection%
|
302 | 314 | %
|
303 | 315 | %
|
|
359 | 371 | We already learned the sequences \inQuotes{\textbraceleft\textbraceleft}\pythonIdx{\textbraceleft\textbraceleft} and \inQuotes{\textbraceright\textbraceright}\pythonIdx{\textbraceright\textbraceright} that were designed for \pglspl{fstring} only.
|
360 | 372 | The backslash-based escape sequence we discussed in this section work for both \pglspl{fstring} and normal strings.%
|
361 | 373 | \pythonIdx{str!escaping}\pythonIdx{escaping}%
|
| 374 | +\FloatBarrier% |
362 | 375 | \endhsection%
|
363 | 376 | %
|
364 | 377 | \hsection{Multi-Line Strings}%
|
|
372 | 385 | Such string delimiters are used for multi-line strings.
|
373 | 386 | In such strings, you can insert linebreaks by hitting \keys{\enter} completely normally.
|
374 | 387 | You can use the escape sequences from the previous section as well.
|
375 |
| -The main use case are \pglspl{docstring}, which we will discuss later, see, e.g., \cref{bp:module:docstrings}. |
376 |
| - |
377 |
| -\bestPractice{longstrDoubleQuote}{When defining a multi-line string literal, the double-quotation mark variant~(\pythonil{"""..."""})\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} is usually preferred over the single-quotation mark variant~(\pythonil{'''...'''}\pythonIdx{\textquotesingle\textquotesingle\textquotesingle})~\cite{PEP257,PEP8}.} |
378 |
| - |
| 388 | +The main use case are \pglspl{docstring}, which we will discuss later, see, e.g., \cref{bp:module:docstrings}.% |
| 389 | +% |
| 390 | +\bestPractice{longstrDoubleQuote}{When defining a multi-line string literal, the double-quotation mark variant~(\pythonil{"""..."""})\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} is preferred over the single-quotation mark variant~(\pythonil{'''...'''}\pythonIdx{\textquotesingle\textquotesingle\textquotesingle})~\cite{PEP257,PEP8}.}% |
| 391 | +% |
379 | 392 | \cref{exec:str_multiline} shows what happens if we print such a multi-line string.
|
380 | 393 | We first create the string by writing the three lines \textil{This is a multi-line string.}, \textil{I can hit enter to begin a new line.}, and \textil{This linebreak is then part of the string.}.
|
381 | 394 | The first line begins with \pythonil{"""}\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} and the last one ends with \pythonil{"""}\pythonIdx{\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl} as well.
|
|
384 | 397 | We can also have multi-line \pglspl{fstring}\pythonIdx{str!f}\pythonIdx{f-string!multi-line}.
|
385 | 398 | These then simply start with \pythonil{f"""}\pythonIdx{f\textquotedbl\textquotedbl\textquotedbl\idxdots\textquotedbl\textquotedbl\textquotedbl}.
|
386 | 399 | The example in \cref{exec:str_multiline} presents such a multi-line \pgls{fstring} with two expressions for \pgls{strinterpolation} which spans over three lines.%
|
| 400 | +\FloatBarrier% |
387 | 401 | \endhsection%
|
388 | 402 | %
|
389 | 403 | \hsection{Unicode and Character Representation}%
|
|
439 | 453 | Anyway, in \cref{exec:str_unicode}, we use the information obtained in \cref{fig:unicodeCharacterTableSubset} to print the Chinese text \inQuotes{你好。} standing for \inQuotes{Hello.} and pronounced as \inQuotes{N{\v{\i}} h{\v{a}}o.} as a unicode-escaped string.
|
440 | 454 | We found that the character for \inQuotes{你} has unicode number~4f60, \inQuotes{好} has~597d, and the big period~\inQuotes{。} has~3002.
|
441 | 455 | The string \pythonil{"\\u4f60\\u597d\\u3002"} then corresponds to the correct Chinese text~\inQuotes{你好。}.%
|
| 456 | +\FloatBarrier% |
442 | 457 | \endhsection%
|
443 | 458 | %
|
444 | 459 | \hsection{Summary}%
|
|
0 commit comments