Filtering Data in Pandas: Conditions and Boolean Indexing

When you first start working with pandas, you’ll often load in a dataset that’s larger than you need. Maybe you only want to look at customers from Canada, or products that cost more than $100, or rows where sales were above average.

This is where filtering comes in. Filtering is the process of narrowing down your DataFrame so you only see the rows that match certain rules (conditions). In pandas, this is done using something called Boolean indexing.

What is Boolean Indexing?

“Boolean” just means True or False. When you apply a condition to a pandas column, pandas checks each value and marks whether it’s True (it meets the condition) or False (it doesn’t).

For example:

import pandas as pd

data = {
    "Country": ["Canada", "USA", "Mexico", "UK", "Germany"],
    "Population": [38, 331, 128, 67, 83],
    "Continent": ["North America", "North America", "North America", "Europe", "Europe"]
}

df = pd.DataFrame(data)

print(df["Population"] > 100)

This outputs:

0    False
1     True
2     True
3    False
4    False
Name: Population, dtype: bool

Each row is marked True or False depending on whether the population is greater than 100.

Step 1: Using Conditions to Filter Rows

Once you have a series of True/False values, you can pass it back into the DataFrame to filter:

filtered = df[df["Population"] > 100]
print(filtered)

Output:

Example DataFrame used in the Filtering tutorial — Countries, Population (millions), Continent
Country Population (millions) Continent
USA 331 North America
Mexico 128 North America

Now you only see countries with populations greater than 100 million.

Step 2: Combining Multiple Conditions

Real-world analysis usually needs more than one condition. In pandas:

  • & means AND

  • | means OR

  • Always wrap conditions in parentheses

# Countries with Population > 100 AND in North America
filtered = df[(df["Population"] > 100) & (df["Continent"] == "North America")]

Step 3: Filtering Text Columns

Conditions work just as well with text:

# Countries in Europe
european = df[df["Continent"] == "Europe"]

Or with multiple categories:

# Countries in Europe or North America
filtered = df[df["Continent"].isin(["Europe", "North America"])]

Step 4: Why Filtering Matters

Filtering is at the heart of analysis:

  • It lets you zoom in on the part of the dataset you care about.

  • It helps with cleaning (e.g., removing invalid rows).

  • It’s the first step before visualizing or calculating statistics.

For example, if you wanted to compare population trends in Europe versus North America, you’d start by filtering your dataset into those regions.

👉 Next, once you’ve filtered data, you’ll often want to sort or rank it to see the biggest and smallest values clearly.

👉 Read the next tutorial: Sorting and Ranking Data in Pandas

FWD EDITORS

We’re a team of data enthusiasts and storytellers. Our goal is to share stories we find interesting in hopes of inspiring others to incorporate data and data visualizations in the stories they create.

Previous
Previous

Sorting and Ranking Data in Pandas

Next
Next

How to Export Pandas DataFrames to CSV and Excel in Python