Intro to Programming: Data Types

Last updated on Jun 19, 2020

Welcome to another article in our Intro to Programming series. In this article, we'll start diving deeper into what a program is, and what it is made of. Particularly, we'll be talking about the concept of data and data types.

In my previous article, we saw our very first Python program, which was a one line of code that printed the text hello world.

print('hello world')

This might seem like a very basic thing to do, but it is a good place to start. I want you to focus on this line of code for a moment. Let's ignore the parentheses and the quotes for now. 

If we took all those symbols out, we'd basically have two parts: the word print, which you can see in red in the screenshot below, and the phrase hello world.

The word print sounds like an instruction that we want to give to the computer - and that's exactly what it is. By typing the word print, we're basically instructing the computer to print something for us on the screen. 

That something is the phrase hello world and you can think of it as data. Data is information that a computer program acts on.

So, to simplify it a bit, you can think of a computer program as a set of instructions that operate on data to solve a particular problem.

That's really what it is. In its simplest form, a computer program takes data as input, does something to that data using the set of instructions we specified and outputs a result, or performs some action.

That data can come from many different sources:

  • user input (someone typing it in at a keyboard, for example)
  • databases
  • sensors
  • APIs etc.  

All programming languages have built-in ways to get data from the outside world. For accessing specialized data sources, the Python community has built packages and released them as open source.

There's an interesting thing about data that you should be aware of, and that is that there are multiple types of it. And here, I want you to keep in mind the distinction between a data source and a data type.

The data source refers to the way the computer program gets the data. Is it from a user typing it at a keyboard? Is it from a file or a database? etc.

The data type refers to what kind of data it is. For example, is the data a number, like number 7? Is it a date, like for example March 17th? Is it text, like hello world? And you might be thinking, that's cool, but why do we need data types? Why do we need to know that some information that was passed to our program is a number or a text?

The reason is simply that you can do different things with different data types. Each data type has some abilities, you can think of them as super-powers. For example, a number can be subtracted mathematically from another number and you'll get yet another number. But subtracting one word from another makes no sense mathematically. Can you divide the word hello by the word summer? No, that makes no sense. But you can divide two numbers.

Further complicating things is the fact that computers are not very good at understanding human concepts. If I ask a friend for a favor, say, to print a sheet of paper with the word print on it, she will understand what to do. A computer will get very confused. First of all, a computer would be unable to parse that request, because computers are not yet very good at extracting meaning from human speech. I would have to simplify my request and use a computer language (a programming language).

I could instruct the computer as follows:

print print

The sequence of characters, p-r-i-n-t, appears twice in that request, but which one is the instruction, and which one is the data? To avoid the confusion, I could make an agreement with the computer that I will always use single or double quotes around text and that everything that is not within quotes should be interpreted as a symbol (for example, an instruction).

So I could then instruct the computer as follows:

print 'print'

This is not valid Python code, but it is valid Ruby code (Ruby is another programming language). Now the computer can make the distinction between the two print words.

So, because all programs that we write are nothing more than sequences of letters and numbers, we need to have a convention, an agreement, between us and the computer about how to interpret different sequences of characters. 

Later in this blog series we'll get acquainted with the various conventions we have in place in the Python programming language. For now, I just want you to be aware that these conventions exist and are necessary to avoid confusion.

Let's have a look at some of the basic data types available in Python. Don't worry about memorizing the information in the table below because we will revisit each one of the data types in detail and learn about their special abilities.

In the table above, we have three columns: one is for the English name of the data type, the second is for the Python name for the data type - you'll notice that it's very similar to the English word, except abbreviated in some cases. And finally the last column provides some examples of data that fits each data type. If you pay attention, you will also notice the conventions we have in place for representing that particular data type (e.g. quotes, different kind of parentheses, etc.).

Let's start from the top and briefly go over the basic Python data types.

Integers

The integer data type is reserved for numbers that can be written without a decimal component. So these are whole numbers and can be both positive and negative, like:

20 10 0 -300

Booleans

The boolean data type can only have one of two values, True or False. We'll go over it in more details in a bit and you'll see that this data type is crucial to writing programming logic.

True False

Floating-point numbers

The floating-point (float) data type is used for numbers that have a decimal component, so numbers like:

3.14 2.73

Strings

Next, we have the string data type. We've seen this data type in action already - it's used to represent text like "hello world" etc. Basically, a string is just a collection of characters surrounded by either single quotes or double quotes. You can already notice something interesting: if I use quotes around the integer 2018, it becomes a string. Again, we'll go over strings in more details in a bit. Here are some examples of strings:

'hello world' "hey88340" '2018'

Lists

Lists are just lists of items. This data type is incredibly powerful, as we'll see. The general concept of list of items is present in pretty much all programming languages. However, Python lists, unlike lists in certain other programming languages, can host items of any other data types. For example, we can have lists that contain integers, strings and booleans - even other lists! Lists are defined by using square brackets, with the items separated by commas. We'll also go over lists in a bit and you'll see - they are very powerful. In the meantime, here is a simple example of a list:

[10, '2018', 'hi']

Dictionaries

Next, we have dictionaries. To understand these, think of an actual dictionary, let's say a Spanish-English dictionary. It has keys - which are the words in Spanish, followed by values - which are the translations in English. This data type is based on the same idea: you have keys and each key is followed by a value.

A Python dictionary is written using curly braces, with commas separating the key-value pairs. We will see that there are some restrictions with respect to what kind of data types are allowed to be keys. However, values can be any data type. Below we have an example. Notice that the word diez (a string) is mapped to the integer 10. This is perfectly ok.

{"la tarde": "afternoon", "noche": "night", "diez": 10}

Tuples

And last, but not least, we have tuples. Tuples are very similar to lists, except we write them between parentheses instead of square brackets. We'll learn how they are different from lists and why they're important later in the course. A tuple example:

(10, 20.0, 'world', -5)

There are, of course, other data types, but these are the fundamental ones we will cover in this blog series. If you are comfortable using these, you will be able to understand any other data type you might encounter in your programming career.

Again, don't worry too much about the specifics right now, as this is just a preview. We'll cover each data type in detail in the following articles. For now, it's important to remember that programs are sets of instructions that operate on data, and that data can have multiple data sources (such as user input, databases, files etc.). Also, each piece of data has a certain data type and that's important because different data types have different super-powers.

Read the next article in the series >

About the author

Ciprian is a software engineer and the CTO of Edlitera. As an instructor, Ciprian is a big believer in first building an intuition about a new topic, and then mastering it through guided deliberate practice.

Before Edlitera, Ciprian worked as a Software Engineer in finance, biotech, genomics and e-book publishing. Ciprian holds a degree in Computer Science from Harvard University.