Intro to Programming: Strings in Python

In the latest article in our Intro to Programming series, we'll talk about strings in Python.
By Ciprian Stratulat • Nov 2, 2021

Welcome back to the latest article in my Intro to Programming series. Today I’ll talk about strings in Python.

 

 

Strings vs. Variables

Use this image as a mental model for the data type “string.” Strings are just a bunch of characters on—you guessed it—a string. I’ll talk more on that in just a bit.

Since so much of our world is text-based (and has been since the invention of writing), you can imagine that strings play a big role in our computer programs. Many problems where computers are useful involve working with text in some form or another. Securing a good grasp on the concept of strings is key to using Python. So, let's see what they're all about!

By way of definition, strings are ordered sequences of characters surrounded by single or double quotes. We'll see in just a moment why Python supports both single and double quotes. Spoiler alert: it's not to intentionally confuse us. In fact, it actually makes things easier. Here are a few examples of using quotes:

"hello"

'hi2022'

"my name is Ciprian"

We can use either single quotes or double quotes in Python. In between quotes, we write the characters that make up our string.

Note that the surrounding quotes (single or double) are not part of the string output. So why put the quotes there at all?

 

Example 1

hi = 5

my_string = hi

# this prints 5
print(hi)

# this also prints 5
print(my_string)

Let's go over an example to answer this question. See Example 1 above. In this example, we define a variable named hi. We then assign it the value 5, which is an integer. Even though it’s a strange name, "hi" is a perfectly good name for a variable. Remember, your variable names should indicate what they're used for. (I can't think of what a variable named hi could possibly store, but it's a valid name, so for the sake of example this could happen.)

On the next line, right underneath, we define a variable called my_string and we assign it the value of the variable hi, which is 5. We can see that when we print the value of the variable hi we get 5 and when we print the value of the variable my_string, we also get 5.

 

Example 2

hi = 5

my_string = "hi"

# this prints 5
print(hi)

# this prints the word hi
print(my_string)

Now let's move on to Example 2 above. Here again we have a variable called hi and we assign it the value 5. But on the next line, we want to assign the variable called my_string the text 'hi'. We do that by surrounding the text hi with single quotes. You could also use double quotes, as we mentioned earlier.

If we forget the quotes, the program would think that we meant to assign my_string the value that is stored in the variable named hi. If we did that, the print value of my_string would show 5, instead of the actual text hi. And in this case, if we print the value of the variable hi we get 5 and if we print the value of the variable my_string we get the actual string hi.

So that's why quotes are important. Without quotes, the Python interpreter will think you’re referring to some variable and will try to find the value of that variable. If it can't find the variable, it will error out. Even worse, if it finds it, you’ll get unexpected results. So when you want to use strings, always remember the quotes! They’re very important.

 

Quotes and Escape Sequences

Let’s play with some strings in Python and learn how to use quotation marks inside a string. I’ll also show you why we can use either single quotes or double quotes for strings.

We’ll start in a Jupyter notebook. In a Jupyter cell, I can type out a string (always between quotes). If I press shift+enter, Jupyter will print the whole string back. But I can also do something a bit fancier. Let's assign this string to a variable called greeting. So now if I print the value of the variable called greeting, I should see the string printed back.

# this is a string
"happy birthday"

# let's assign this string to a variable
greeting = "happy birthday"

# this prints the words happy birthday
print(greeting)

Let’s also create a variable called question. We’ll assign it to the string 'what day of the week is it?'. Next, we can create a variable called answer, and assign it to say 'yay, it's friday'. Notice that if I use single quotes to wrap this string, I get an invalid syntax error. The error points at the 's' right after the single quote in the word “it’s”. Why is that?

# assign the question 'what day of the week is it?' to a variable
question = 'what day of the week is it?'

# assign the answer 'yay, it's Friday!' to another variable.
# when using single quotes around this string, running this line will throw a syntax error
answer = 'yay, it's Friday!'

In this case, the Python interpreter understands that I’m trying to create a string because I’ve typed an open quote and a close quote, which mark a string. However, as far as the interpreter is concerned, my string ended right after the word 'it', because that's where it found the single quote that matched the opening quote. So the remaining text ('s') immediately after the single quote is nonsense. The interpreter doesn’t know what to do with it, which causes the error.

The Jupyter notebook helpfully shows the strings in red. Doing so makes them easy to distinguish from variable names, for example, which are black. The letters 's friday' do not appear red because they’re not in the string, because the string ended before them.

From this example, it might seem that you can't ever use single quotes inside strings when also using single quotes to wrap the string. But in fact, there are two ways out of this situation. The first is to use double quotes. That's why we have them! Using double quotes, we can rewrite this whole thing as "yay, it's friday" and now it works perfectly.

So what if we have a string that contains multiple instances of double quotes, like "she said "hello""? In this case we can't use double quotes because the string has double quotes inside of it. But, we can use single quotes! We can rewrite it as 'she said "hello"' and now it works.

But what happens if you have both? Now we're in trouble. Or maybe not! In Python—and in many similar programming languages—a way exists to include both single and double quotes in your strings without confusing the interpreter. This method is called escaping the quote. It's perhaps a bit funny, trying to escape a quote, but that's what it's called. You can escape the quote by simply adding a backslash before the single or double quote that is causing issues. Let me show you: 'she said "hello, it\'s friday"'.

You might notice here that I have both single and double quotes in my string and I'm using single quotes on the outside, to wrap the string. So in this case, I simply put a backslash before the single quote inside the string and all is good.

This might seem a bit odd at first. Perhaps you're wondering how Python knows not to print the backslash. It's one of the rules that was built into the Python interpreter. Python actually considers \' to be a single character, not two, and the same is true for \". These are called escape sequences.

There is another escape sequence that I want to show you. Let’s say we want to print a new line. We want to say "hello world", but more specifically we'd like to print “hello” on one line and “world” on a separate line. You might be tempted to just do this: print hello and hit enter, and type world and close the quotes.

Well, that won't work. In fact, doing so will get us another syntax error. That’s because the Python interpreter is expecting the string to end on the same line it started. In other words, it all needs to be on one line. So how do you get a new line? By typing a backslash n, like this: print("hello\nworld").

When the code runs, the backslash n gets replaced by the newline character. The newline character is the character you'd get when hitting enter on your keyboard to start a new line.

So now we’ve learned how to write strings and about escape sequences that allow us to handle situations which would otherwise be tricky. Next, we’ll learn about some cool properties of strings and how to use them.

 

What Are Indexes?

Let’s continue our exploration of the string data type. In particular, we’ll talk next about indexing, including what an index is, how it works, and why we should care about it.

Let's go back to this image we looked at earlier. It shows the words "happy birthday" on a string. The image is a good analogy, since we use the word “string” for this data type to indicate a string as a sequence of characters placed on a literal string.

So let's write “happy birthday” out as a Python string. The end result should look like the words “happy birthday” between quotes. In this example, we'll use double quotes, but single quotes would work as well.

"happy birthday"

Let’s begin talking about indexing by looking at the word “index”.

an index is an integer number that indicates the location of something

One definition for the word is an indicator or a sign. Indexes can be, for example, indicators of location. Before we had digital records, libraries had index cards. The index cards told you where in the library you could find a particular book, among other details about the book itself. This is the meaning of “index” that we'll focus on. An index, in the context of Python programming, is an integer number that indicates the location of something.

This may sound a bit abstract, but bear with me for a moment. Let's go back to our "happy birthday" string. Underneath it, let's write the numbers 0, 1, 2, 3, ... all the way to 13. By doing so we can assign each character in the string a number, including the space between the words “happy” and “birthday”. These numbers indicate the location of each character inside the string. For example, the letter "h" is at location 0. In programming, just like in math, we always start counting at 0—it's just how things are. Typically, we'd refer to "h" as the first letter. But mathematically and programmatically, we say it is at location 0. Similarly, letter "b" is at position 6 and the space character (between happy and birthday) is at position 5.

All of these numbers are actually indexes. That is, they are integer numbers that indicate the location of each character inside the string. This concept is very similar to how books have indexes, which really are just lists of locations where you can find certain words or topics in the book.

an index is an integer number that indicates the location of an item that is part of a sequence

Let’s revisit the index definition we gave earlier to make it a bit more specific: an index is an integer number that indicates the location of an item that is part of a sequence. The reason I wanted you to see this more generalized definition is because indexes are relevant not just to strings, but to other data types as well, as we'll see later. So understanding indexes in this general context is useful.

Now, many programming languages will support this type of index, where you start on the left and count toward the right of a string. But Python goes beyond that, and actually supports a reverse index as well. To get the reverse index of a character in a string, we start at the end of the string and begin counting at -1, like so: "y" here is at location -1, "a" is at location -2, "d" is at location -3, and so on.

This is called the reverse index. The reason why we use negative numbers in the reverse index is because we already used positive numbers in the forward index. For example, if we used 0 again for the last character, we'd have two items living at the same address. This would be confusing. So, when we count left to right, we start at 0 and go up; when we count right to left, we start at -1 and go down.

So why does Python have a reverse index and a forward index? Because Python was built to make things easy. If you want, for example, to know the first character in a string, all you have to do is ask the string to return the character at index 0. If you want to know the last character in a string, you ask the string to return the character at index -1. We'll go over how exactly to do that in just a moment. For now, whenever you see a string, imagine numbers underneath each character. These are the indexes.

 

Indexes in Python

Let’s take a look at how indexes work in the context of Python strings.

Indexes are important because they allow us to inspect a string and extract from it characters that are located at specific locations. To show this in action, let's write some code.

greeting = "happy birthday"

# this will print the words happy birthday
print(greeting)

Let's begin by assigning the string “happy birthday” to a variable. Let's call this variable greeting and say it's equal to "happy birthday". Now, we can output greeting. If you are following along in a Jupyter notebook, remember that Jupyter notebooks allow us to type the name of the variable and it will output its value.

So now we ask the question, what is the first letter in our string? Remember, the first letter is at index 0. But how do we communicate to Python that we want the character that is at index 0 inside the greeting variable? For that, we use this special notation: greeting[0].

# this outputs the letter "h" from the string "happy birthday"
greeting[0]

This is just a notation. It's something that's easy enough for humans to write and easy enough for the Python interpreter to understand. Python developers could have come up with any number of ways to write it, but this is what they decided on—so don't worry too much about it. Just remember to use those square brackets. Many programming languages use the same notation, so at least you won't have a hard time getting used to it again if you decide to learn another programming language in the future.

So in Python, to get the character that is located at a certain index in a string, we simply write the variable that stores the string, followed by square brackets. In between the brackets, we write the index. Now let me show you something even cooler: the index itself can be a variable.

I’ll define a variable x and set it 0. Right below, I’ll output the character that is located at index x. That is to say, the character at index 0 inside the greeting variable.

x = 0

# this also outputs the letter "h" from the string "happy birthday"
greeting[x]

The output gives the same thing: the character “h”. But what if we change x to 6 and run this cell again?

x = 6

# this outputs the letter "b" from the string "happy birthday"
greeting[x]

We get the character “b”, which is correct as we saw in the earlier image. If you're following along in your own Jupyter notebook, try a few on your own. I'm going to write a few more myself: greeting[2] is “p”, which makes sense. If you’re confused because “p” is actually the third letter in the word “happy”, remember that in Python we always start counting at 0. It might take a bit to get used to it, but the more code you write, the more it will make sense. You might even end up numbering your shopping list starting at 0, like I do.

Let's try one more. This time, let's try the reverse index. Calling greeting[-1] should return the very last character in the string.

x = -1

# this outputs the letter "y", the last one in the string "happy birthday"
greeting[x]

And it does, we get the character “y”. And calling greeting[-2] returns character “a”. Now let's try greeting[5]. The character at location 5 is actually a space and that's what is returned.

Now, you might be wondering about these results: the space or the “a” above, what data type are they? Remember, every piece of data in a Python program has a specific data type. So does a special single character data type exist that we haven't yet heard about? The answer is no. These characters are still a string, but they are strings that have a single character. Each string returned is the character that happens to be at the location we inquired about.

If you're curious, you can check the data type of any variable or piece of data by using the type function. We haven't yet talked about what functions are, but for now you can think of them as a synonym of “instruction”. Functions are keywords that instruct the computer to do a certain action. The Python print function, for example, is also a function. The function prints a string we put between parentheses. The Python type function prints the type of whatever we put between the parentheses. Let me show you.

# this returns str, which is short for string
type("hello")

First, we’ll start with some easier examples. Inputting type('hello') returns str. The output str, if you remember the data types table I showed you in an earlier article, is short for string. We’re seeing again the use of parentheses. We saw them before when we used the print function.

Let's try a few more examples to see how type works. Inputting type(5), we see that it's an int.

# this returns int, which is short for integer
type(5)

Inputting type(11.3), we see that it’s a float.

# this returns float
type(11.3)

The type function is not very useful when the data type of an input is obvious. For example, “hello” is clearly a string and 5 is clearly an integer. It is very useful, however, if you have a variable and you don't know the type. Let's set x = 15, and right below it we can write type(x). We get int. Even this might not be very impressive because we know x is 15 and it is an integer. But what if you get the value for x from a file or a database? When writing a program, you can’t always anticipate x’s data type. So if you want to know, you can use the type function.

Remember the single character strings? If you want to convince yourself that they are indeed still strings, you can write type and then between parentheses write greeting[0]. The character at index 0 has the data type we want to check. The type function returns str.

# this returns str
type(greeting[0])

The type function can be very useful, but you probably won't use it much in the beginning. But it's good to know about. It's also good to know that when an index into a string returns a character, that piece of data has a data type of string as well.

# this returns IndexError, because the string "happy birthday" does not contain index 100
greeting[100]

I want to cover one more thing about indexes. Let's go back to our greeting. If I output it again we see the phrase "happy birthday". But what happens if I try something silly, like ask Python to tell me what the character at location 100 is? If I type greeting[100], I get an IndexError. The error message is pretty clear: the variable greeting doesn't store a string that is more than 100 characters long, so nothing exists at location 100. We're trying to access a location that doesn't exist.

These index out of range errors actually happen more frequently than you might think. So be careful when indexing into a string. Make sure that location exists. How might you be sure that an index location exists? Well, one way is by looking at the total length of the string, and making sure you never inquire about an index that's higher than it. We'll cover that in the next section. But for now, remember that if you use an index for a location that doesn't exist in a string, you'll get an error.

 

String Slicing

Next, let’s learn about string slicing in Python. The actions described in this section are done in the same Jupyter notebook as before.

Previously, we saw that indexing allows us to see a character that is at a given position in a string. So rather than asking about a single position, can we ask about multiple positions? For example, can we ask to see the characters located in a string between index 2 and index 5?

We actually can! This is called slicing. The Python notation for slicing is similar to that of indexing. In the previous section, we defined a variable called greeting and we set it to the string “happy birthday”. To get the characters located in this string between index 2 and index 5, we can write greeting[2:5]. So again, with indexing we use square brackets. Inside them, we write the index we want to start at, followed by a colon, and then by the index we want to stop at. The result is “ppy”.

# this outputs "ppy"
greeting[2:5]

Let's double check that. We know that in the phrase "happy birthday", “h” is at index 0, “a” is at index 1, and “p” is at index 2. The next “p” is at index 3 and “y” is at index 4. The Python interpreter returned another string that contains the characters that are at indexes 2, 3, and 4. But why didn't we get the character at index 5 as well? After all, it seems that we asked for all of the characters between index 2 and index 5.

Slicing returns the characters starting at the first index, which is 2 in our example, and up to but not including the stop index, which is 5. Let's try another example. Inputting greeting[0:2] will return “h” and “a”. The character “h” is at index 0, and “a” is at index 1. Again, we got all the characters starting at index 0 and up to but not including index 2.

You might forget this from time to time, but that's okay. The more code you write, the more it will become fixed in your memory. In fact, try slicing a string of your own choice in your Jupyter notebook. Repetition and practice are the best ways to get comfortable with these notations.

I’ll end this section with a couple more tricks that will come in handy. First, if you're slicing a string and want to start at the beginning and then go up to a given index, you can actually skip writing the starting index. Look at this example code: greeting[:2]. When inputting greeting[:2], we get the same result as greeting[0:2]. In this case, the Python interpreter assumes that it should start slicing at the beginning of the string.

# this outputs "ha"
greeting[0:2]

# this also outputs "ha"
greeting[:2]

# this outputs "ppy birthday"
greeting[2:]

# this outputs the entire string "happy birthday"
greeting[:]

Similarly, when you want to go all the way to the end of your string, you can skip the second index like this: greeting[2:]. In this case, the Python interpreter will start at index 2 and go all the way to the end of the string. So what happens if you skip both the first and last index and just write greeting[:]? Well, as you might expect, Python will return a slice of the string that starts at the beginning of the string and goes all the way to the end. In other words, the result will be the entire string. This particular notation is rarely used, because the simplest way to return the whole string doesn’t require any slicing. Just call the variable that the string is in, without any square brackets.

I encourage you to play around with string slicing. Try multiple combinations and see if they make sense.

 

Step Slicing

Next, we'll add another step to our string slicing using our Jupyter notebook. We’ll use a new string for these examples because the string “happy birthday” has some repeated characters that may cause confusion.

Let's define a new string variable called letters and assign it the string “abcdefghij”. You can use type to double check that it is a string: type(letters). Slices of strings are also called substrings. In this example, we want a substring of this string that starts at index 0 and goes all the way up to but not including index 9. We can do that pretty easily using letters[:9]. As discussed in the previous section, we skipped the starting index because Python will know that we want to start at index 0. The result is the string “abcdefghi”.

letters = 'abcdefghij'

# this returns str
type(letters)

# this returns 'abcdefghi'
letters[:9]

Now, what if we want to get every other character from this string? I don't want “abcdefghi”, but instead I want a string that looks like “acegi”. Can I do that easily using string slicing? The answer is—of course! To do so, we start with our normal notation for slicing (letters[:9]). But this time, after the 9 we add another colon and we write 2 (letters[:9:2]). Using this notation gives us the string we wanted.

# this outputs 'acegi'
letters[:9:2]

Using the other colon and the integer number 2 is called a step. A step tells Python to slice the string and select characters in jumps of 2. But you don't have to go in jumps of 2, you can pick any step you want. Let's try it in jumps of 3 using letters[:9:3]. Again, this notation tells Python to start at index 0 in the string, go all the way up to, but not including, the character at index 9, and select every third character. So letters[:9:3] returns “a”, “d”, and “g”.

# this returns "adg"
letters[:9:3]

I mentioned earlier that if we want the whole string we can drop both the start and end index. That rule still applies here. We can write letters[::3] to return “adgj”. This result includes the “j” because we asked to go all the way to the end. But what happens if I type letters[::1]? In this case, I get the whole string. Using letters[::1] is basically asking to slice starting at the beginning of the string, going all the way to the end of the string, while selecting every character.

# this returns "adgj"
letters[::3]

# this returns the full string, "abcdefghij"
letters[::1]

That's it for indexing and slicing. These are very important concepts when it comes to inspecting sequences in general in Python, and we'll run into them again when we talk about lists. So take some time to practice your indexing and slicing, and make sure you have a good grasp on them. They might seem silly on their own, but when you write code, you will index and slice all the time. Any input or data manipulation almost always involves some slicing and indexing.

 

Python Functions vs. Methods

Before we learn about useful properties and methods that we can use with strings, we should understand the similarities and differences between Python functions and methods.

In upcoming sections, we'll go over several useful functions and methods that can be used with strings. We'll cover Python functions in more detail later on, but for now, we'll work on building a basic mental model that can help in understanding these concepts better.

First, let's quickly review the functions we've seen so far. The first one was print. When we want to print something on the screen, we know that we can write the word print, and then, between parentheses, we put whatever we want to print. That can either be an actual string, an integer number, or even a variable. If we're printing a variable, Python will figure out what value is stored in that variable and it'll print out that value.

# this outputs the word something
print("something")

Then we saw type. And type looks very similar to print. Instead of printing the value of a variable or of a piece of data, type prints the data type of that variable or piece of data.

# this outputs str
type("something")

Initially and for simplification, I called these functions “instructions”. Doing so helps us think of them as commands to give our computers. In other words, we tell them to print something, or we ask them to tell us what the data type for a given variable is. In programming, we call these commands functions. Some of them are built into the Python programming language, but we'll soon learn how to create and use our own. And when I say functions, what I mean is commands or instructions that I can give to the computer. While this definition is a bit of a simplification, it is a good way to think about functions at this stage.

To successfully execute certain commands, or said another way, to successfully run some functions, the computer needs to know some details.

functions = instructions

For instance, if you tell your computer to print, it needs to know what you'd like to print. Below, the word “hello” answers the question, What do you want to print?

# Hey, print function, please print the string "hello"!
print("hello")

Similarly, if you want to know the data type, you need to provide the answer to the question, What do you want the data type of? We specify this using the parentheses. Don't confuse parentheses with the square brackets we saw in indexing and slicing.

# Hey, type function, what is the type of 5?
type(5)

The green keywords that we used, print and type, are the function names. That's probably intuitive enough.

What perhaps is not as intuitive is that we call “hello” and “5” parameters.

I also want to introduce you to another bit of programming lingo. When you hear people say “call a function”, “run a function”, or “execute a function”, they all mean the same thing.

Programmer Speak

You...

   call a function

   run a function

   execute a function

And when you hear a programmer say they are “passing a parameter to a function”, they are calling or running the function with a parameter.

That's all we're going to discuss about functions for now.

Next, let’s look at methods.

At a high level, we can think of a method as a function that's attached to an object.

method = function that's attached to an object

So, what's an object?

In the Python programming language, almost everything is an object. Pretty much like in the real world.

all are objects:
strings, integer numbers, floating-point numbers, variables etc.

A lamp is an object, a car is an object, a variable is an object, a string is an object, an integer number is an object, and so on. For the purposes of this course, that's all we need to know about objects.

Going back to our definition of a method, we can say that a method is simply a function that's attached to a string, an integer number, or anything else we can call an object.

method = function that's attached to a string or an integer number etc.

So, why do we need to distinguish between functions and methods?

The answer is because we call/run/execute methods slightly differently than functions. In the next section, we'll see exactly how.

we call / run / execute methods in a slightly different way from functions

 

String Length

In this section, we’ll start exploring useful functions that work with strings. We'll begin with the len() function, which measures the length of a string.

In the previous section, we learned that functions and methods in Python are very similar, but that they are also slightly different. Now, let's see both of them in action and learn about useful functions and methods that are attached to strings. Let's jump back into a Jupyter notebook and write some code.

Let's define a variable and assign to it the string “Good morning”. I’ll call the variable greeting. You can call a variable whatever you’d like, but remember to make the name descriptive of what it stores. In this case, I'm storing a greeting so greeting seems like an appropriate name.

greeting = 'Good morning'

# this prints the string 'Good morning'
print(greeting)

We have two ways to see what we store in this greeting variable. One is to type it out and then press shift+enter. The other is to use the print function: print(greeting). The first method will only work in Jupyter notebooks and in the Python interpreter command line. It will not work in scripts. That's important to remember. When writing scripts using an editor, like Sublime or Atom, you must use the print function if you want to print something to the screen. But for now we're in a Jupyter notebook so we can choose either method.

Remember indexing? We can use it here to get characters at specific locations in the string. Let's do greeting[5], which returns “m”. Remember, a space is a character as well so we need to count it, and the counting starts at 0. That’s why “m” is at index 5. You'll also remember that if we try to get a character at a location that doesn't exist, we'll get an error. So if I try to run greeting[20], I'll get an IndexError. I mentioned earlier that one way to make sure a location always exists is by checking the length of a string. But how do we do that?

# this returns the string "m"
greeting[5]

# this returns an IndexError
greeting[20]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-28-d4f2b1cd16e7> in <module>()
----> 1 greeting[20]

IndexError: string index out of range

Python has a helpful built-in function that we can use. It's called len, which should be easy to remember because it's short for length. The function len returns the length of an input. If I write len(greeting), I get 12.

# this returns 12, which is the length of the string 'Good morning'
len(greeting)

This sometimes confuses people in the beginning. The function len doesn't return the number of characters that are in the name of the variable. In this case, the word greeting, which is the name of my variable, has 8 characters. Instead, what len does is look at the length of the value stored in the variable named greeting and returns the number of characters in that phrase. My variable contains the phrase “good morning”, which has exactly 12 characters: 4 in the word good, 1 space (we count the spaces too), and 7 in the word morning. So that's 4+1+7, which is 12.

By the way, if you want to know the number of characters in the word greeting, you should do this instead: len('greeting'). While doing this, the Jupyter notebook will nicely highlight to show that because I included quotes, I must be talking about a string. Python will count and return the number of characters in the string, which in this case is 8.

Remember that len only works with sequences. So for example, len(5) won't work. Let's try it out: len(5). The error message returned by Python is quite descriptive, and tells us that the object 5 has a data type int. In other words, because our object is an integer number it does not have length. This error makes sense. Even outside of programming, we don’t talk about the length of a number. We instead talk about the number of digits in a number.

# this returns a TypeError
len(5)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-91fed648bb37> in <module>()
----> 1 len(5)

TypeError: object of type 'int' has no len()

For example, let's say we have a really big number and we want to know how many digits it has. How do we do that? We can write it as a string and then use the len function. Like this: len('1000000000000000').

# this returns 16, which is the number of digits in the parameter
len('1000000000000000')

So, the len function tells us how many characters are in a sequence. In this case, the sequences we're talking about are sequences of characters—also known, of course, as strings. Before we move on, I’ll mention one other tip for using variables in general. Variables must be defined before anything else in the program attempts to read their values. This should hopefully make sense. We can't get the value of a variable that was never defined. For example, try to type len(my_var) and underneath define my_var = 'hi'. This won't work.

# this will throw a NameError, because the variable my_var is not defined
len(my_var)
my_var = 'hi'

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-33-d842508b60b1> in <module>()
----> 1 len(my_var)
      2 my_var = 'hi'

NameError: name 'my_var' is not defined

Why? When the Python interpreter reads code, it always reads from top to bottom. So when it tries to run the code in this cell, it first encounters the command len. When it tries to run the command, it tries to get the value from a variable called my_var. But at this point, there is no variable called my_var. So the Python interpreter will throw an error. However, if we put the definition of the variable my_var above the line that calls the len function, the function will work.

 

Operator Overloading

Next we’ll go over the concept of operator overloading: what it is, why we use it, and how it works when it comes to strings. Back to the Jupyter notebook!

We’ll start with string concatenation. Concatenation is the process of merging two or more sequences. So, how do we do this? It's simple. To concatenate, we use the plus sign, like this: 'hi' + 'ciprian'. Using plus signs in this way returns the word “hi” merged with “ciprian”. In this instance, there is no space in between the words because concatenation doesn't add any extra characters, and space is an extra character. Concatenation simply takes one string and puts it immediately next to the other.

# in Jupyter notebook, this outputs the string "hiciprian"
'hi' + 'ciprian'

At this point you might be saying, Wait, I thought you can't add two strings? And it's true, we can't add two strings from a mathematical point of view. When Python was written, the developers could have used any other sign for string concatenation. However, they decided to use the plus sign because it was easy. I guess it can be argued that similarities exist between the concept of adding two numbers and combining two strings together. Maybe. And maybe not.

But this process, in which the plus operator was repurposed, is called "operator overloading”. It's a fun term, in my opinion. By overloading the plus operator, you're making it do more than it was originally supposed to do.

It begs the question, is there a minus operator overloading for strings? Let's try it out: 'hi' - 'ciprian'.

# this throws a TypeError
'hi' - 'ciprian'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-4dc0a46a80fb> in <module>()
----> 1 'hi' - 'ciprian'

TypeError: unsupported operand type(s) for -: 'str' and 'str'

In this case, we get an error. In programming there's no concept of subtracting one string from another. So if you want to concatenate two strings, use the plus sign.

We can also concatenate using string variables.

I’ll input greeting +—and this time I'm going to add a space—' Ciprian'.

# this outputs 'Good morning Ciprian'
greeting + ' Ciprian'

This will work because the Python interpreter replaced the variable named greeting with the value it stored, and then concatenated that value with the string ' Ciprian'. Nothing happened to the value stored in the greeting variable in this process. The value in the variable remains the same and still there after concatenation.

If we want to update the variable in greeting we can use variable assignment. In the previous example, I can take the exact same concatenation and reassign the result of it back into the greeting variable. Doing so looks like this: greeting = greeting  + ' Ciprian'. Now if I output greeting, we should see that greeting stores a new value, which is “Good morning Ciprian”.

greeting = greeting + ' Ciprian'

# this now outputs 'Good morning Ciprian'
print(greeting)

For fun, let's try running this cell containing the variable assignment again.

greeting = greeting + ' Ciprian'

# this now outputs 'Good morning Ciprian Ciprian'
print(greeting)

What happens? The Python interpreter will return two Ciprians. After the last variable assignment, the value stored in greeting was changed from “Good morning” to the string “Good morning Ciprian”. When I reran this cell, I took that “Good morning Ciprian” and concatenated it with another “Ciprian”. This returns the result “Good morning Ciprian Ciprian”.

Because plus is used for both strings and numbers, we must be careful about data types. Let me show you what I mean.

# this outputs 3
1 + 2

# this outputs '12'
'1' + '2'

If I type 1 + 2, I get 3. That makes sense: 1 is an integer number and 2 is an integer number. Adding the number 1 to the number 2 yields 3. What if I do the same thing, but this time with strings? '1' + '2' returns '12'. That's because in this case, 1 is a string and 2 is also a string. String concatenation doesn't do math. Instead, it simply merges the two strings together. The resulting 12 is not an integer, even though it may look like one. It is a string.

The type function can come in handy for paying attention to data types. If you're reading data from user input and the user is mischievous, they could be giving you strings where you expected integers. This can cause unexpected results. (This is a good time to introduce you to one of the key tenets of safe computer programming: do not ever, under any circumstances, trust the user of your program. That's how vulnerabilities get created and computers get hacked.)

Let's go back to operator overloading. We saw that Python repurposed the plus sign to give us an easy way to merge two strings. And we also saw that it didn't do the same for the minus sign. Python has one more operator that got overloaded: the multiplication operator. Let me show you how.

Let's define a new string variable called letter and assign it the string “a”. What happens if I multiply this variable by 3, for example? In this case, I get “aaa”—three “a”s in a row! What happens if I multiply it by 5? Similarly, I get five “a”s.

letter = 'a'

# this outputs 'aaa'
letter * 3

# this outputs 'aaaaa'
letter * 5

# this outputs an empty string ''
letter * 0

You can use the multiplication operator to concatenate a string to itself as many times as you want. In this case, I'm concatenating the string “a” to itself three or five times, respectively. Try it out on your own. What happens if I multiply a string by 0? In this case, I get an empty string, which makes sense. Zero times anything is always zero.

 

String Methods

In this section, we'll go over three useful string methods: upper, lower and split. Let's jump back into our Jupyter notebook.

In the previous section, we learned that we can use the multiplication operator to merge a string with itself a number of times. We also learned that we can use the plus sign to merge strings together.

 

.upper()

We also previously defined methods as functions that are attached to an object. To learn more about methods, let's start back with our variable named greeting. From the operator overloading section, we have “Good morning Ciprian Ciprian” stored as a value in greeting. What are some other ways we can manipulate this string? If I type greeting followed by a period (greeting.) and hit tab, Python will display a list. The list shows all of the methods and properties that are available to my variable. I’ll scroll all the way down and select upper from this list. Then, I’ll follow upper with open and close parentheses. In the end we get greeting.upper(). Running this code, I see that my text is now printed in all uppercase letters. That's what the method upper does; it takes a string and changes it to uppercase.

# this outputs 'GOOD MORNING CIPRIAN CIPRIAN'
greeting.upper()

Let's focus on the syntax of greeting.upper(). The first part, greeting, is just the name of the variable. The last part, upper(), looks very much like other functions we've seen: we have the name of the method followed by open and close parentheses. We don't have any parameters between the parentheses because the upper method doesn't require any. And that's a good thing to remember: not all functions and methods require parameters. Those that do require parameters don't necessarily require only one parameter. Functions and methods can require any number of parameters, including none.

The upper method looks like the functions we've seen before, except that in between the variable name and the method name we have a period. This again is just a notation. The period is a way to tell the Python interpreter to run the upper method that is attached to this object called greeting and provide the result. Using the method notation may look weird at first, but you'll get used to it with practice.

Can we call upper on a different object? Of course! Let's try it on the variable named letter, which we previously assigned to the lowercase string “a”. I’d suggest quickly outputting letter to double check its value first. Then, I’ll run letter.upper(). This returns an uppercase “A”. Note that there is no space before or after the period; it must be immediately after the object name and right before the method name.

# this returns 'A'
letter.upper()

Keep in mind that calling this method doesn't actually change the value stored in the objects. For example, if I print greeting, it's value is still “Good morning Ciprian Ciprian”. And if I print letter, it's value is still lowercase “a”. If you want to change the value stored in a variable, you need to use variable assignment. As before, this looks like greeting = greeting.upper(). In this case, we’re updating greeting to store the uppercase version for our string.

# running greeting.upper() DOES NOT change the string stored in the variable greeting
# instead, to change the original string, we must run this:
greeting = greeting.upper()

And even if a method or function doesn't have parameters, we must still use the parentheses. If I type letter.upper, the Python interpreter will tell me that upper is a function attached to a string. I need to use the parentheses to actually execute the method: letter.upper().

 

.lower()

Similar to upper, Python also has a lower method. Let's take our all uppercase greeting and use the lower method to print the lowercase version of it: greeting.lower(). This returns “good morning ciprian ciprian”.

# this returns "good morning ciprian ciprian"
greeting.lower()

 

.split()

The third string method is the split method. split allows us to take a string and break it up into multiple parts, programmatically. This method is particularly useful for extracting words from a phrase. For instance, it allows me to get the list of words that make up my greeting. To use it, I type greeting.split(' '). Doing so generates a list of words that are in my string. We'll talk about lists very soon, so don't worry about these brackets and commas for now. We'll come back to them.

# this returns the list of words in greeting
# the output is ['GOOD' 'MORNING', 'CIPRIAN', 'CIPRIAN']
greeting.split(' ')

In contrast to upper or lower, split takes one parameter. The parameter is the character that you want to use when splitting the string. In the previous example, I chose to split by a space. But you can, if you want, split by a different character. For example, greeting.split('O') gives me this.

# this returns ['G', '', 'D M', 'RNING CIPRIAN CIPRIAN']
greeting.split('O')

Notice that there are no letter “O”s in any of these parts. That's because the character chosen for splitting is eliminated when doing the split.

split also doesn't modify the value stored in our string. If I output greeting, I should see that it’s still intact with all the “O”s and spaces in place.

# this still outputs 'GOOD MORNING CIPRIAN CIPRIAN'
print(greeting)

And although I mentioned earlier that split takes one parameter to split by, that parameter is optional. If you don't include it, split will assume that you want to split a string into parts separated by a space. So for example, greeting.split() again gives me a list of all the words in my string.

# if you don't pass a parameter to the split method, it will use a space ' ' by default
# this outputs the list of words in greeting ['GOOD', 'MORNING', 'CIPRIAN', 'CIPRIAN']
greeting.split()

This concludes our exploration of strings and their properties. Strings are powerful and widely used in programming. Spend some time to practice what we’ve learned and get comfortable with using the various properties and methods of strings. And don't limit yourself to what I showed you! Try out some of the other methods and see what they do. In the next article, we'll look at another powerful data type called a list. Thanks for following along!

Ciprian Stratulat

CTO | Software Engineer

Ciprian Stratulat

Ciprian is a software engineer and the CTO of Edlitera. As an instructor, Ciprian is a big believer in first building an intuition about a new topic, and then mastering it through guided deliberate practice.

Before Edlitera, Ciprian worked as a Software Engineer in finance, biotech, genomics and e-book publishing. Ciprian holds a degree in Computer Science from Harvard University.