Intro to Programming: What Are Strings in Python

Strings play a big role in our computer programs, as a lot of programs involve working with text in some form or another.
By Ciprian Stratulat • Updated on Oct 12, 2022
blog image

Welcome back to the latest article in my Intro to Programming series. Today I’ll talk about strings in Python.

 

 

What is the Difference Between Strings and Variables?

Happy Birthday letters representing a stringImage Source: Cristina Hernandez, Unsplash, https://unsplash.com/photos/jiZ6Vs-U6b8

Use this image as a mental model for the data type string. Strings are just a bunch of characters on—you guessed it—a string. I’ll talk more on that in just a bit.

Since so much of our world is text-based (and has been since the invention of writing), you can imagine that strings play a big role in our computer programs. Many problems where computers are useful involve working with text in some form or another. Securing a good grasp on the concept of strings is key to using Python. So, let's see what they're all about!

By way of definition, strings are ordered sequences of characters surrounded by single or double quotes. You'll see in just a moment why Python supports both single and double quotes. Spoiler alert: it's not to intentionally confuse us. In fact, it actually makes things easier.

Here are a few examples of using quotes in Jupyter notebook:

"hello"

'hi2022'

"my name is Ciprian"

You can use either single quotes or double quotes in Python. In between quotes, you write the characters that make up your string.

Note that the surrounding quotes (single or double) are not part of the string output. So why put the quotes there at all?

 

 

Example 1 

Let's go over an example to answer this question in Jupyter notebook. See Example 1 below:

hi = 5

my_string = hi

# this prints 5
print(hi)

# this also prints 5
print(my_string)

In this example, I defined a variable named hi. I then assign it the value 5, which is an integer. Even though it’s a strange name hi is a perfectly good name for a variable. Remember, your variable names should indicate what they're used for. I can't think of what a variable named hi could possibly store, but it's a valid name, so for the sake of example this could happen.

On the next line, right underneath, I defined a variable called my_string and I assign it the value of the variable hi, which is 5. You can see that when I print the value of the variable hi, I get 5 and when you print the value of the variable my_string, you also get 5.

 

Example 2

Now let's move on to Example 2 above in Jupyter notebook:

hi = 5

my_string = "hi"

# this prints 5
print(hi)

# this prints the word hi
print(my_string)

Here again I have a variable called hi and I assign it the value 5. But on the next line, I want to assign the variable called my_string the text 'hi.' I do that by surrounding the text 'hi' with single quotes.

You could also use double quotes, as I  mentioned earlier. If you forget the quotes, the program would think that you meant to assign my_string the value that is stored in the variable named 'hi.' If I did that, the print value of my_string would show 5, instead of the actual text 'hi.' And in this case, if I print the value of the variable 'hi' I get 5, and if I print the value of the variable my_string I get the actual string 'hi.'

So that's why quotes are important. Without quotes, the Python interpreter will think you’re referring to some variable and will try to find the value of that variable. If it can't find the variable, it will error out. Even worse, if it finds it, you’ll get unexpected results. So when you want to use strings, always remember the quotes! They’re very important.

 

What Are Quotes and Escape Sequences?

Let’s play with some strings in Python and learn how to use quotation marks inside a string. I’ll also show why you can use either single quotes or double quotes for strings.

Let's start in a Jupyter notebook. In a Jupyter cell, I can type out a string (always between quotes). If I press Shift + Enter, Jupyter will print the whole string back. But I can also do something a bit fancier. Let's assign this string to a variable called greeting.

So now, if I print the value of the variable called greeting, I should see the string printed back:

# this is a string
"happy birthday"

# let's assign this string to a variable
greeting = "happy birthday"

# this prints the words happy birthday
print(greeting)

Let’s also create a variable called question. I'll assign it to the string 'what day of the week is it'?

Next, I can create a variable called answer, and assign it to say 'yay, it's Friday.' Notice that if I use single quotes to wrap this string, I get an invalid syntax error.

The error points at the 's' right after the single quote in the word “it’s. Why is that?

# assign the question 'what day of the week is it?' to a variable
question = 'what day of the week is it?'

# assign the answer 'yay, it's Friday!' to another variable.
# when using single quotes around this string, running this line will throw a syntax error
answer = 'yay, it's Friday!'

In this case, the Python interpreter understands that I’m trying to create a string because I’ve typed an open quote and a close quote, which mark a string. However, as far as the interpreter is concerned, my string ended right after the word 'it,' because that's where it found the single quote that matched the opening quote. So the remaining text ('s') immediately after the single quote is nonsense. The interpreter doesn’t know what to do with it, which causes the error.

Jupyter notebook helpfully shows the strings in red. Doing so makes them easy to distinguish from variable names, for example, which are black. The letters 's Friday' do not appear red because they’re not in the string, because the string ended before them.

From this example, it might seem that you can't ever use single quotes inside strings when also using single quotes to wrap the string. But in fact, there are two ways out of this situation: the first is to use double quotes - that's why I have them! Using double quotes, I can rewrite this whole thing as "yay, it's Friday" and now it works perfectly.

So what if I have a string that contains multiple instances of double quotes, like "she said "hello""? In this case, I can't use double quotes because the string has double quotes inside of it. But, I can use single quotes! I can rewrite it as 'she said "hello"' and now it works.

 

What If You Want to Use Both Single Quotes and Double Quotes?

But what happens if you have both? Now I'm in trouble. Or maybe not!

In Python—and in many similar programming languages—a way exists to include both single and double quotes in your strings without confusing the interpreter. This method is called escaping the quote. It's perhaps a bit funny, trying to escape a quote, but that's what it's called.

You can escape the quote by simply adding a backslash before the single or double quote that is causing issues. Let me show you: 'she said "hello, it\'s friday"'.

You might notice here that I have both single and double quotes in my string and I'm using single quotes on the outside, to wrap the string. So in this case, I simply put a backslash before the single quote inside the string and all is good.

This might seem a bit odd at first. Perhaps you're wondering how Python knows not to print the backslash. It's one of the rules that was built into the Python interpreter. Python actually considers \' to be a single character, not two, and the same is true for \". These are called escape sequences.

There is another escape sequence that I want to show you. Let’s say you want to print a new line. You want to say "hello world," but more specifically you'd like to print “hello” on one line, and “world” on a separate line. You might be tempted to just do this: print hello and hit Enter, type world, and close the quotes.

Well, that won't work. In fact, doing so will get us another syntax error. That’s because the Python interpreter is expecting the string to end on the same line it started. In other words, it all needs to be on one line. So how do you get a new line? By typing a backslash n, like this: print("hello\nworld").

When the code runs, the \n gets replaced by the newline character. The newline character is the character you'd get when hitting Enter on your keyboard to start a new line.

So now, you've learned how to write strings and about escape sequences that allow you to handle situations which would otherwise be tricky. Next, you'll learn about some cool properties of strings and how to use them.

 

What Are Indexes in Python? 

Let’s continue exploring of the string data type. In particular, I'll talk next about indexing, including what an index is, how it works, and why you should care about it.

Let's go back to the image I looked at earlier. It shows the words "Happy Birthday" on a string. The image is a good analogy, since I use the word “string” for this data type to indicate a string as a sequence of characters placed on a literal string.

Happy Birthday letters representing a string in PythonImage Source: Cristina Hernandez, Unsplash, https://unsplash.com/photos/jiZ6Vs-U6b8

So let's write “Happy Birthday” out as a Python string. The end result should look like the words “happy birthday” between quotes.

In this example in Jupyter notebook, I'll use double quotes, but single quotes would work as well:

"happy birthday"

Let’s begin talking about indexing by looking at the word "index." An index is an integer number that indicates the location of something.

One definition for the word "index" is an indicator or a sign. Indexes can be, for example, indicators of location. Before we had digital records, libraries had index cards. The index cards told you where in the library you could find a particular book, among other details about the book itself. This is the meaning of "index" that I'll focus on.

An index, in the context of Python programming, is an integer number that indicates the location of something.

Strings in Python: Happy Birthday indexes

Image Source: Edlitera

This may sound a bit abstract, but bear with me for a moment.

Let's go back to the "happy birthday" string. Underneath it, let's write the numbers 0, 1, 2, 3, all the way to 13. By doing so, you can assign each character in the string a number, including the space between the words “happy” and “birthday." These numbers indicate the location of each character inside the string.

For example, the letter "h" is at location 0. In programming, just like in math, you always start counting at 0 — it's just how things are. Typically, you'd refer to "h" as the first letter. But mathematically and programmatically, you say it is at location 0.

Similarly, letter "b" is at position 6 and the space character (between happy and birthday) is at position 5.

Strings in Python: Happy Birthday indexes

Image Source: Edlitera 

All of these numbers are actually indexes. That is, they are integer numbers that indicate the location of each character inside the string. This concept is very similar to how books have indexes, which really are just lists of locations where you can find certain words or topics in the book. An index is an integer number that indicates the location of an item that is part of a sequence

 

What Are Reverse Indexes in Python?

Let’s revisit the index definition I gave earlier to make it a bit more specific: an index is an integer number that indicates the location of an item that is part of a sequence. The reason I wanted you to see this more generalized definition is because indexes are relevant not just to strings, but to other data types as well, as you'll see later. So understanding indexes in this general context is useful.

Now, many programming languages will support this type of index where you start on the left and count toward the right of a string. But Python goes beyond that, and actually supports a reverse index as well.

To get the reverse index of a character in a string, you start at the end of the string and begin counting at -1, like so: "y" here is at location -1, "a" is at location -2, "d" is at location -3, and so on.

Strings in Python: Happy Birthday indexes and reverse indexes

Source Image: Edlitera

 

This is called the reverse index. The reason why you use negative numbers in the reverse index is because you already used positive numbers in the forward index.

For example, if you used 0 again for the last character, you'd have two items living at the same address. This would be confusing. So, when you count left to right, you start at 0 and go up; when you count right to left, you start at -1 and go down.

So why does Python have a reverse index and a forward index? Because Python was built to make things easy. If you want, for example, to know the first character in a string, all you have to do is ask the string to return the character at index 0. If you want to know the last character in a string, you ask the string to return the character at index -1.

I'll go over how exactly to do that in just a moment. For now, whenever you see a string, imagine numbers underneath each character. These are the indexes.

 

Article continues below

How Indexes Work in Python Strings

Let’s take a look at how indexes work in the context of Python strings.

Indexes are important because they allow you to inspect a string and extract from it characters that are located at specific locations.

To show this in action, let's write some code:

greeting = "happy birthday"

# this will print the words happy birthday
print(greeting)

Let's begin by assigning the string “happy birthday” to a variable. Let's call this variable greeting and say it's equal to "happy birthday." Now, you can output greeting. If you are following along in a Jupyter notebook, remember that Jupyter notebooks allow you to type the name of the variable and it will output its value.

So now I'll ask the question, what is the first letter in this string? Remember, the first letter is at index 0. But how do you communicate to Python that you want the character that is at index 0 inside the greeting variable?

For that, you use this special notation: greeting[0].

# this outputs the letter "h" from the string "happy birthday"
greeting[0]

This is just a notation. It's something that's easy enough for humans to write and easy enough for the Python interpreter to understand. Python developers could have come up with any number of ways to write it, but this is what they decided on—so don't worry too much about it. Just remember to use those square brackets. Many programming languages use the same notation, so at least you won't have a hard time getting used to it again if you decide to learn another programming language in the future.

So in Python, to get the character that is located at a certain index in a string, you simply write the variable that stores the string, followed by square brackets. In between the brackets, you write the index.

Now let me show you something even cooler: the index itself can be a variable.

I’ll define a variable x and set it 0. Right below, I’ll output the character that is located at index x. That is to say, the character at index 0 inside the greeting variable:

x = 0

# this also outputs the letter "h" from the string "happy birthday"
greeting[x]

The output gives the same thing: the character "hi." But what if I change x to 6 and run this cell again?

x = 6

# this outputs the letter "b" from the string "happy birthday"
greeting[x]

I get the character “b," which is correct as you saw in the earlier image.

If you're following along in your own Jupyter notebook, try a few on your own. I'm going to write a few more myself: greeting[2] is “p, which makes sense.

If you’re confused because “p” is actually the third letter in the word “happy," remember that in Python you always start counting at 0. It might take a bit to get used to it, but the more code you write, the more it will make sense. You might even end up numbering your shopping list starting at 0, like I do.

Let's try one more. This time, I'll try the reverse index. Calling greeting[-1] should return the very last character in the string:

x = -1

# this outputs the letter "y", the last one in the string "happy birthday"
greeting[x]

And it does! I get the character “y" and calling greeting[-2] returns character “a." Now let's try greeting[5]. The character at location 5 is actually a space and that's what is returned. 

Now, you might be wondering about these results: the space or the “a” above, what data type are they? Remember, every piece of data in a Python program has a specific data type. So does a special single character data type exist that you haven't yet heard about? The answer is no.

These characters are still a string, but they are strings that have a single character. Each string returned is the character that happens to be at the location you inquired about.

If you're curious, you can check the data type of any variable or piece of data by using the type function. You haven't yet talked about what functions are, but for now you can think of them as a synonym of instructions. Functions are keywords that instruct the computer to do a certain action. The Python print function, for example, is also a function. The function prints a string you put between parentheses. The Python type function prints the type of whatever you put between the parentheses.

Let me show you:

# this returns str, which is short for string
type("hello")

First, I'll start with some easier examples. Inputting type('hello') returns str. The output str, if you remember the data types table I showed you in an earlier article, is short for string. You're seeing again the use of parentheses. You saw them before when I used the print function.

Let's try a few more examples to see how type works. Inputting type(5), you see that it's an int:

# this returns int, which is short for integer
type(5)

Inputting type(11.3), you can see that it’s a float.

# this returns float
type(11.3)

The type function is not very useful when the data type of an input is obvious.

For example, “hello” is clearly a string and 5 is clearly an integer. It is very useful, however, if you have a variable and you don't know the type. If you set x = 15, and right below it you write type(x), you get int. Even this might not be very impressive because you know x is 15 and it is an integer. But what if you get the value for x from a file or a database? When writing a program, you can’t always anticipate x’s data type. So if you want to know, you can use the type function.

Remember the single character strings? If you want to convince yourself that they are indeed still strings, you can write type and then between parentheses write greeting[0]. The character at index 0 has the data type you want to check.

The type function returns str:

# this returns str
type(greeting[0])

The type function can be very useful, but you probably won't use it much in the beginning. But it's good to know about.

It's also good to know that when an index into a string returns a character, that piece of data has a data type of string as well:

# this returns IndexError, because the string "happy birthday" does not contain index 100
greeting[100]

I want to cover one more thing about indexes.

Let's go back to our greeting. If I output it again you see the phrase "happy birthday." But what happens if I try something silly, like ask Python to tell me what the character at location 100 is? 

If I type greeting[100], I get an IndexError. The error message is pretty clear: the variable greeting doesn't store a string that is more than 100 characters long, so nothing exists at location 100. I'm trying to access a location that doesn't exist.

These index out of range errors actually happen more frequently than you might think. So be careful when indexing into a string. Make sure that location exists. How might you be sure that an index location exists?

Well, one way is by looking at the total length of the string, and making sure you never inquire about an index that's higher than string's maximum. I'll cover that in the next section. But for now, remember that if you use an index for a location that doesn't exist in a string, you'll get an error.

 

What Is String Slicing in Python?

Next, let’s learn about string slicing in Python. The actions described in this section are done in the same Jupyter notebook as before.

Previously, you saw that indexing allows you to see a character that is at a given position in a string. So rather than asking about a single position, can you ask about multiple positions? For example, can you ask to see the characters located in a string between index 2 and index 5?

You actually can! This is called slicing. The Python notation for slicing is similar to that of indexing. In the previous section, I defined a variable called greeting and I set it to the string “happy birthday." To get the characters located in this string between index 2 and index 5, you can write greeting[2:5].

So again, with indexing I'll use square brackets. Inside them, I'll write the index we want to start at, followed by a colon, and then by the index I want to stop at. The result is “ppy."

# this outputs "ppy"
greeting[2:5]

Let's double check that.

I know that in the phrase "happy birthday," “h” is at index 0, “a” is at index 1, and “p” is at index 2. The next “p” is at index 3 and “y” is at index 4. The Python interpreter returned another string that contains the characters that are at indexes 2, 3, and 4. But why didn't I get the character at index 5 as well? After all, it seems that I asked for all of the characters between index 2 and index 5. Slicing returns the characters starting at the first index, which is 2 in my example, and up to but not including the stop index, which is 5.

Let's try another example. Inputting greeting[0:2] will return “h” and “a." The character “h” is at index 0, and “a” is at index 1. Again, I got all the characters starting at index 0 and up to but not including index 2.

You might forget this from time to time, but that's okay. The more code you write, the more it will become fixed in your memory. In fact, try slicing a string of your own choice in your Jupyter notebook. Repetition and practice are the best ways to get comfortable with these notations.

I’ll end this section with a couple more tricks that will come in handy. First, if you're slicing a string and want to start at the beginning and then go up to a given index, you can actually skip writing the starting index. Look at this example code: greeting[:2]. When inputting greeting[:2], I get the same result as greeting[0:2].

In this case, the Python interpreter assumes that it should start slicing at the beginning of the string:

# this outputs "ha"
greeting[0:2]

# this also outputs "ha"
greeting[:2]

# this outputs "ppy birthday"
greeting[2:]

# this outputs the entire string "happy birthday"
greeting[:]

Similarly, when you want to go all the way to the end of your string, you can skip the second index like this: greeting[2:]. In this case, the Python interpreter will start at index 2 and go all the way to the end of the string.

So what happens if you skip both the first and last index and just write greeting[:]? Well, as you might expect, Python will return a slice of the string that starts at the beginning of the string and goes all the way to the end. In other words, the result will be the entire string.

This particular notation is rarely used, because the simplest way to return the whole string doesn’t require any slicing. Just call the variable that the string is in, without any square brackets. I encourage you to play around with string slicing. Try multiple combinations and see if they make sense.

 

What is Step Slicing in Python Strings?

Next, I'll add another step to my string slicing using Jupyter notebook. I'll use a new string for these examples, because the string “happy birthday” has some repeated characters that may cause confusion.

Let's define a new string variable called letters and assign it the string 'abcdefghij.' You can use type to double check that it is a string: type(letters). Slices of strings are also called substrings.

In this example, I'll want a substring of this string that starts at index 0 and goes all the way up to, but not including, index 9. I can do that pretty easily using letters[:9]. As discussed in the previous section, I skipped the starting index because Python will know that I want to start at index 0.

The result is the string 'abcdefghi.'

letters = 'abcdefghij'

# this returns str
type(letters)

# this returns 'abcdefghi'
letters[:9]

Now, what if I want to get every other character from this string? I don't want 'abcdefghi' but instead I want a string that looks like 'acegi.' Can I do that easily using string slicing? The answer is — of course! To do so, we start with our normal notation for slicing (letters[:9]). But this time, after the 9 I add another colon and I write 2 (letters[:9:2]).

Using this notation gives me the string I wanted:

# this outputs 'acegi'
letters[:9:2]

Using the other colon and the integer number 2 is called a step. A step tells Python to slice the string and select characters in jumps of 2. But you don't have to go in jumps of 2, you can pick any step you want.

Let's try it in jumps of 3 using letters[:9:3]. Again, this notation tells Python to start at index 0 in the string, go all the way up to, but not including, the character at index 9, and select every third character.

So letters[:9:3] returns “a," “d," and “g":

# this returns "adg"
letters[:9:3]

I mentioned earlier that if I want the whole string I can drop both the start and end index. That rule still applies here.

I can write letters[::3] to return “adgj." This result includes the “j” because I asked to go all the way to the end.

But what happens if I type letters[::1]? In this case, I get the whole string. Using letters[::1] is basically asking to slice starting at the beginning of the string, going all the way to the end of the string, while selecting every character:

# this returns "adgj"
letters[::3]

# this returns the full string, "abcdefghij"
letters[::1]

That's it for indexing and slicing. These are very important concepts when it comes to inspecting sequences in general in Python, and you'll run into them again when I talk about lists. So take some time to practice your indexing and slicing, and make sure you have a good grasp on them. They might seem silly on their own, but when you write code, you will index and slice all the time. Any input or data manipulation almost always involves some slicing and indexing.

 

 

What Are The Differences Between Python Functions, Methods, and Objects?

Before you learn about useful properties and methods that you can use with strings, you should understand the similarities and differences between Python functions and methods.

In upcoming sections, I'll go over several useful functions and methods that can be used with strings. I'll cover Python functions in more detail later on, but for now, I'll work on building a basic mental model that can help you understanding these concepts better.

First, let's quickly review the functions you've seen so far. The first one was print. When you want to print something on the screen, you know that you can write the word print, and then, between parentheses, you put whatever you want to print. That can either be an actual string, an integer number, or even a variable. If you're printing a variable, Python will figure out what value is stored in that variable and it'll print out that value:

# this outputs the word something
print("something")

Then you saw type, and type looks very similar to print. Instead of printing the value of a variable or of a piece of data, type prints the data type of that variable or piece of data:

# this outputs str
type("something")

 

What Are Functions in Python?

Initially and for simplification, I called these functions “instructions." Doing so helps you think of them as commands to give your computer. In other words, you tell them to print something, or you ask them to tell you what the data type for a given variable is.

In programming, we call these commands functions. Some of them are built into the Python programming language, but you'll soon learn how to create and use your own. When I say functions, what I mean is commands, or instructions, that I can give to the computer. While this definition is a bit of a simplification, it is a good way to think about functions at this stage.

To successfully execute certain commands, or said another way, to successfully run some functions, the computer needs to know some details.

Functions = instructions

For instance, if you tell your computer to print, it needs to know what you'd like to print. Below, the word “hello” answers the question, what do you want to print?

# Hey, print function, please print the string "hello"!
print("hello")

Similarly, if you want to know the data type, you need to provide the answer to the question, what do you want the data type of?

I specified this using the parentheses. Don't confuse parentheses with the square brackets you saw in indexing and slicing.

# Hey, type function, what is the type of 5?
type(5)

The green keywords that I used, print and type, are the function names. That's probably intuitive enough:

Strings in Python: anatomy of a function

Image Source: Edlitera

What perhaps is not as intuitive is that “hello” and “5” are called parameters:

strings in python: anatomy of a function

Image Source: Edlitera

I also want to introduce you to another bit of programming lingo. When you hear people say “call a function," “run a function," or “execute a function," they all mean the same thing.

Programmer Speak

You...

   Call a function

   Run a function

   Execute a function

When you hear a programmer say they are “passing a parameter to a function," they are calling, or running the function with a parameter.

That's all I'm going to discuss about functions for now.

 

What Are Methods in Python?

Next, let’s look at methods.

At a high level, you can think of a method as a function that's attached to an object.

Method = function that's attached to an object

 

What Are Objects in Python?

So, what's an object?

In the Python programming language, almost everything is an object. Pretty much like in the real world.

All are objects:
Strings, Integer numbers, Floating-point numbers, Variables, etc.

A lamp is an object, a car is an object, a variable is an object, a string is an object, an integer number is an object, and so on.

 

What Are Methods in Python? 

For the purposes of this course, that's all you need to know about objects.

Going back to my definition of a method, a method is simply a function that's attached to a string, an integer number, or anything else you can call an object.

Method = function that's attached to a string or an integer number etc.

So, why do you need to distinguish between functions and methods?

The answer is because you call/run/execute methods slightly differently than functions. In the next section, you'll see exactly how.

We call / run / execute methods in a slightly different way from functions

 

 

How Do We Measure String Length in Python?

In this section, I'll start exploring useful functions that work with strings. I'll begin with the len() function, which measures the length of a string.

In the previous section, you learned that functions and methods in Python are very similar, but that they are also slightly different. Now, let's see both of them in action and learn about useful functions and methods that are attached to strings. Let's jump back into a Jupyter notebook and write some code.

Let's define a variable and assign to it the string “Good morning." I’ll call the variable greeting. You can call a variable whatever you’d like, but remember to make the name descriptive of what it stores.

In this case, I'm storing a greeting so greeting seems like an appropriate name:

greeting = 'Good morning'

# this prints the string 'Good morning'
print(greeting)

I have two ways to see what I store in this greeting variable. One is to type it out and then press Shift + Enter. The other is to use the print function: print(greeting).

The first method will only work in Jupyter notebooks and in the Python interpreter command line. It will not work in scripts. That's important to remember. When writing scripts using an editor, like Sublime or Atom, you must use the print function if you want to print something to the screen. But for now I'm in a Jupyter notebook so I can choose either method.

Remember indexing? I can use it here to get characters at specific locations in the string. Let's do greeting[5], which returns "m." Remember, a space is a character as well so I need to count it, and the counting starts at 0. That’s why “m” is at index 5. You'll also remember that if I try to get a character at a location that doesn't exist, I'll get an error.

So if I try to run greeting[20], I'll get an IndexError. I mentioned earlier that one way to make sure a location always exists is by checking the length of a string. But how do I do that?

# this returns the string "m"
greeting[5]

# this returns an IndexError
greeting[20]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-28-d4f2b1cd16e7> in <module>()
----> 1 greeting[20]

IndexError: string index out of range

Python has a helpful built-in function that I can use. It's called len, which should be easy to remember because it's short for length.

The function len returns the length of an input. If I write len(greeting), I get 12:

# this returns 12, which is the length of the string 'Good morning'
len(greeting)

This sometimes confuses people in the beginning. The function len doesn't return the number of characters that are in the name of the variable.

In this case, the word greeting, which is the name of my variable, has 8 characters. Instead, what len does is look at the length of the value stored in the variable named greeting and returns the number of characters in that phrase. My variable contains the phrase “good morning," which has exactly 12 characters: 4 in the word good, 1 space (count the spaces too), and 7 in the word morning. So that's 4+1+7, which is 12.

By the way, if you want to know the number of characters in the word greeting, you should do this instead: len('greeting'). While doing this, the Jupyter notebook will nicely highlight to show that because I included quotes, I must be talking about a string. Python will count and return the number of characters in the string, which in this case is 8.

Remember that len only works with sequences. So for example, len(5) won't work. Let's try it out: len(5). The error message returned by Python is quite descriptive, and tells us that the object 5 has a data type int.

In other words, because my object is an integer number it does not have length. This error makes sense. Even outside of programming, I don’t talk about the length of a number.

I instead talk about the number of digits in a number:

# this returns a TypeError
len(5)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-91fed648bb37> in <module>()
----> 1 len(5)

TypeError: object of type 'int' has no len()

For example, let's say I  have a really big number and I want to know how many digits it has. How do I do that?

I can write it as a string and then use the len function. Like this: len('1000000000000000').

# this returns 16, which is the number of digits in the parameter
len('1000000000000000')

So, the len function tells you how many characters are in a sequence. In this case, the sequences I'm talking about are sequences of characters — also known, of course, as strings.

Before I move on, I’ll mention one other tip for using variables in general. Variables must be defined before anything else in the program attempts to read their values. This should hopefully make sense. I can't get the value of a variable that was never defined. For example, try to type len(my_var) and underneath define my_var = 'hi.'

This won't work:

# this will throw a NameError, because the variable my_var is not defined
len(my_var)
my_var = 'hi'

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-33-d842508b60b1> in <module>()
----> 1 len(my_var)
      2 my_var = 'hi'

NameError: name 'my_var' is not defined

Why? When the Python interpreter reads code, it always reads from top to bottom. So when it tries to run the code in this cell, it first encounters the command len. When it tries to run the command, it tries to get the value from a variable called my_var.

But at this point, there is no variable called my_var. So the Python interpreter will throw an error. However, if I put the definition of the variable my_var above the line that calls the len function, the function will work.

 

What is Concatenation, Operator Overloading, and Multiplication Operators in Python?

Next, I'll go over the concept of operator overloading: what it is, why it's used, and how it works when it comes to strings. Back to the Jupyter notebook!

 

What is Concatenation in Python? 

I'll start with string concatenation. Concatenation is the process of merging two or more sequences. So, how do I do this? It's simple. To concatenate, I use the plus sign, like this: 'hi' + 'ciprian.' Using plus signs in this way returns the word “hi” merged with “ciprian."

In this instance, there is no space in between the words because concatenation doesn't add any extra characters, and space is an extra character. Concatenation simply takes one string and puts it immediately next to the other:

# in Jupyter notebook, this outputs the string "hiciprian"
'hi' + 'ciprian'

At this point you might be saying, Wait, I thought you can't add two strings? And it's true. You can't add two strings from a mathematical point of view.

When Python was written, the developers could have used any other sign for string concatenation. However, they decided to use the plus sign because it was easy. I guess it can be argued that similarities exist between the concept of adding two numbers and combining two strings together. Maybe. And maybe not.

 

What Is Operator Overloading in Python?

This process, in which the plus operator was repurposed, is called operator overloading. It's a fun term, in my opinion. By overloading the plus operator, you're making it do more than it was originally supposed to do.

It begs the question, is there a minus operator overloading for strings? Let's try it out: 'hi' - 'ciprian.'

# this throws a TypeError
'hi' - 'ciprian'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-4dc0a46a80fb> in <module>()
----> 1 'hi' - 'ciprian'

TypeError: unsupported operand type(s) for -: 'str' and 'str'

In this case, I get an error. In programming there's no concept of subtracting one string from another. So if you want to concatenate two strings, use the plus sign.

I can also concatenate using string variables.

I’ll input greeting + — and this time I'm going to add a space — ' Ciprian.'

# this outputs 'Good morning Ciprian'
greeting + ' Ciprian'

This will work because the Python interpreter replaced the variable named greeting with the value it stored, and then concatenated that value with the string 'Ciprian.' Nothing happened to the value stored in the greeting variable in this process. The value in the variable remains the same and still there after concatenation.

If I want to update the variable in greeting, I can use variable assignment. In the previous example, I took the exact same concatenation and reassign the result of it back into the greeting variable. Doing so looks like this: greeting = greeting  + ' Ciprian.' Now if I output greeting, I should see that greeting stores a new value, which is “Good morning Ciprian."

greeting = greeting + ' Ciprian'

# this now outputs 'Good morning Ciprian'
print(greeting)

For fun, let's try running this cell containing the variable assignment again:

greeting = greeting + ' Ciprian'

# this now outputs 'Good morning Ciprian Ciprian'
print(greeting)

What happens? The Python interpreter will return two Ciprians. After the last variable assignment, the value stored in greeting was changed from “Good morning” to the string “Good morning Ciprian." When I reran this cell, I took that “Good morning Ciprian” and concatenated it with another “Ciprian." This returns the result “Good morning Ciprian Ciprian."

Because + is used for both strings and numbers, you must be careful about data types.

Let me show you what I mean:

# this outputs 3
1 + 2

# this outputs '12'
'1' + '2'

If I type 1 + 2, I get 3. That makes sense: 1 is an integer number and 2 is an integer number. Adding the number 1 to the number 2 yields 3.

What if I do the same thing, but this time with strings? '1' + '2' returns '12.' That's because in this case, 1 is a string and 2 is also a string. String concatenation doesn't do math. Instead, it simply merges the two strings together. The resulting 12 is not an integer, even though it may look like one. It is a string.

The type function can come in handy for paying attention to data types. If you're reading data from user input and the user is mischievous, they could be giving you strings where you expected integers. This can cause unexpected results. This is a good time to introduce you to one of the key tenets of safe computer programming: do not ever, under any circumstances, trust the user of your program. That's how vulnerabilities get created and computers get hacked!

Let's go back to operator overloading. You saw that Python repurposed the + sign to give us an easy way to merge two strings. And you also saw that it didn't do the same for the - sign.

 

What Are Multiplication Operators in Python?

Python has one more operator that got overloaded: the multiplication operator. Let me show you how:

Let's define a new string variable called letter and assign it the string “a." What happens if I multiply this variable by 3, for example? In this case, I get “aaa” — three “a”'s in a row!

What happens if I multiply it by 5? Similarly, I get five “a” 's:

letter = 'a'

# this outputs 'aaa'
letter * 3

# this outputs 'aaaaa'
letter * 5

# this outputs an empty string ''
letter * 0

You can use the multiplication operator to concatenate a string to itself as many times as you want.

In this case, I'm concatenating the string “a” to itself three or five times, respectively. Try it out on your own. What happens if you multiply a string by 0? In this case, I'd get an empty string, which makes sense. Zero times anything is always zero.

 

 

What Are String Methods?

In this section, I'll go over three useful string methods: upper, lower, and split. Let's jump back into Jupyter notebook.

In the previous section, you learned that you can use the multiplication operator to merge a string with itself a number of times. You also learned that you can use the plus sign to merge strings together.

 

What Does the .upper() Method Do ?

I previously defined methods as functions that are attached to an object. To learn more about methods, let's start back with my variable named greeting.

From the operator overloading section, I have “Good morning Ciprian Ciprian” stored as a value in greeting. What are some other ways I can manipulate this string? If I type greeting followed by a period (greeting.) and hit Tab, Python will display a list.

The list shows all of the methods and properties that are available to my variable. I’ll scroll all the way down and select upper from this list. Then, I’ll follow upper with open and close parentheses. In the end I get greeting.upper(). Running this code, I see that my text is now printed in all uppercase letters.

That's what the method upper does; it takes a string and changes it to uppercase:

# this outputs 'GOOD MORNING CIPRIAN CIPRIAN'
greeting.upper()

Let's focus on the syntax of greeting.upper(). The first part, greeting, is just the name of the variable. The last part, .upper(), looks very much like other functions you've seen: I have the name of the method followed by open and close parentheses.

I don't have any parameters between the parentheses because the upper method doesn't require any, and that's a good thing to remember: not all functions and methods require parameters. Those that do require parameters don't necessarily require only one parameter. Functions and methods can require any number of parameters, including none.

The .upper() method looks like the functions you've seen before, except that in between the variable name and the method name you have a period. This again is just a notation. The period is a way to tell the Python interpreter to run the upper method that is attached to this object called greeting and provide the result. Using the method notation may look weird at first, but you'll get used to it with practice.

Can you call upper on a different object? Of course!

Let's try it on the variable named letter, which I previously assigned to the lowercase string “a." I’d suggest quickly outputting letter to double check its value first. Then, I’ll run letter.upper(). This returns an uppercase “A."

Note that there is no space before or after the period; it must be immediately after the object name and right before the method name:

# this returns 'A'
letter.upper()

Keep in mind that calling this method doesn't actually change the value stored in the objects.

For example, if I print greeting, it's value is still “Good morning Ciprian Ciprian" and if I print letter, it's value is still lowercase “a." If I want to change the value stored in a variable, I need to use variable assignment. As before, this looks like greeting = greeting.upper().

In this case, I'm updating greeting to store the uppercase version for my string:

# running greeting.upper() DOES NOT change the string stored in the variable greeting
# instead, to change the original string, we must run this:
greeting = greeting.upper()

Even if a method or function doesn't have parameters, you must still use the parentheses. If you type letter.upper(), the Python interpreter will tell you that upper is a function attached to a string. You need to use the parentheses to actually execute the method: letter.upper().

 

What Does the .lower() Method Do?

Similar to .upper(), Python also has a .lower() method. Let's take our all uppercase greeting and use the .lower() method to print the lowercase version of it: greeting.lower().

This returns “good morning ciprian ciprian":

# this returns "good morning ciprian ciprian"
greeting.lower()

 

What Does the .split() Method Do?

The third string method is the .split() method. .split() allows you to take a string and break it up into multiple parts, programmatically. This method is particularly useful for extracting words from a phrase. For instance, it allows me to get the list of words that make up my greeting. To use it, I type greeting.split(' '). Doing so generates a list of words that are in my string.

I'll talk about lists very soon, so don't worry about these brackets and commas for now. I'll come back to them.

# this returns the list of words in greeting
# the output is ['GOOD' 'MORNING', 'CIPRIAN', 'CIPRIAN']
greeting.split(' ')

In contrast to .upper(), or .lower(), .split() takes one parameter. The parameter is the character that you want to use when splitting the string. In the previous example, I chose to split by a space. But you can, if you want, split by a different character.

For example, greeting.split('O') gives me this:

# this returns ['G', '', 'D M', 'RNING CIPRIAN CIPRIAN']
greeting.split('O')

Notice that there are no letter “O” 's in any of these parts. That's because the character chosen for splitting is eliminated when doing the split.

.split() also doesn't modify the value stored in my string. If I output greeting, I should see that it’s still intact with all the “O” 's and spaces in place:

# this still outputs 'GOOD MORNING CIPRIAN CIPRIAN'
print(greeting)

Although I mentioned earlier that .split() takes one parameter to split by, that parameter is optional. If you don't include it, .split() will assume that you want to split a string into parts separated by a space.

So for example, greeting.split() again gives me a list of all the words in my string:

# if you don't pass a parameter to the split method, it will use a space ' ' by default
# this outputs the list of words in greeting ['GOOD', 'MORNING', 'CIPRIAN', 'CIPRIAN']
greeting.split()

This concludes this exploration of strings and their properties. Strings are powerful and widely used in programming. Spend some time to practice what you've learned and get comfortable with using the various properties and methods of strings. Don't limit yourself to what I showed you! Try out some of the other methods and see what they do.

In the next article, I'll look at another powerful data type called a list. Thanks for following along!

 

Ciprian Stratulat

CTO | Software Engineer

Ciprian Stratulat

Ciprian is a software engineer and the CTO of Edlitera. As an instructor, Ciprian is a big believer in first building an intuition about a new topic, and then mastering it through guided deliberate practice.

Before Edlitera, Ciprian worked as a Software Engineer in finance, biotech, genomics and e-book publishing. Ciprian holds a degree in Computer Science from Harvard University.