Filtering Data in Pandas: Conditions and Boolean Indexing
When you first start working with pandas, you’ll often load in a dataset that’s larger than you need. Maybe you only want to look at customers from Canada, or products that cost more than $100, or rows where sales were above average.
This is where filtering comes in. Filtering is the process of narrowing down your DataFrame so you only see the rows that match certain rules (conditions). In pandas, this is done using something called Boolean indexing.
What is Boolean Indexing?
“Boolean” just means True or False. When you apply a condition to a pandas column, pandas checks each value and marks whether it’s True (it meets the condition) or False (it doesn’t).
For example:
import pandas as pd
data = {
"Country": ["Canada", "USA", "Mexico", "UK", "Germany"],
"Population": [38, 331, 128, 67, 83],
"Continent": ["North America", "North America", "North America", "Europe", "Europe"]
}
df = pd.DataFrame(data)
print(df["Population"] > 100)
This outputs:
0 False
1 True
2 True
3 False
4 False
Name: Population, dtype: bool
Each row is marked True
or False
depending on whether the population is greater than 100.
Step 1: Using Conditions to Filter Rows
Once you have a series of True/False values, you can pass it back into the DataFrame to filter:
filtered = df[df["Population"] > 100]
print(filtered)
Output:
Country | Population (millions) | Continent |
---|---|---|
USA | 331 | North America |
Mexico | 128 | North America |
Now you only see countries with populations greater than 100 million.
Step 2: Combining Multiple Conditions
Real-world analysis usually needs more than one condition. In pandas:
&
means AND|
means ORAlways wrap conditions in parentheses
# Countries with Population > 100 AND in North America
filtered = df[(df["Population"] > 100) & (df["Continent"] == "North America")]
Step 3: Filtering Text Columns
Conditions work just as well with text:
# Countries in Europe
european = df[df["Continent"] == "Europe"]
Or with multiple categories:
# Countries in Europe or North America
filtered = df[df["Continent"].isin(["Europe", "North America"])]
Step 4: Why Filtering Matters
Filtering is at the heart of analysis:
It lets you zoom in on the part of the dataset you care about.
It helps with cleaning (e.g., removing invalid rows).
It’s the first step before visualizing or calculating statistics.
For example, if you wanted to compare population trends in Europe versus North America, you’d start by filtering your dataset into those regions.
👉 Next, once you’ve filtered data, you’ll often want to sort or rank it to see the biggest and smallest values clearly.
👉 Read the next tutorial: Sorting and Ranking Data in Pandas