Getting Started With Python Pandas Library
If you’re working with data in Python, the library pandas is one of the most powerful tools you can learn. It provides high-level data structures and tools designed to make data manipulation, cleaning, and analysis simple and fast. In this guide you’ll get a practical introduction to pandas — how to install it, import it, create your first DataFrame, and perform some basic operations.
Why Use pandas?
Here are several reasons to learn pandas when working with data in Python:
pandas supports two primary data structures: the Series (1-D) and the DataFrame (2-D), which make working with tabular or structured data intuitive.
It simplifies tasks like reading data from CSV/Excel/JSON, cleaning missing values, filtering and transforming data, summarizing statistics, and much more.
It integrates well with other popular Python libraries such as NumPy, Matplotlib, and scikit‑learn, making it a core piece of the “PyData” stack.
For learners and practitioners, understanding pandas is an essential step to move from raw data to insight.
Installing pandas
Before you begin, you need to have Python installed (preferably Python 3.x). You can install pandas using the pip command as follows:
pip install pandas
If you are using a distribution such as Anaconda (which often comes with pandas pre-installed), you may skip this step.
Importing pandas
In your Python script or Jupyter Notebook, you typically import pandas like this:
import pandas as pd
Using pd as an alias is a widespread convention and makes your code more concise and readable. Example:
import pandas as pd
print(pd.__version__) # Check pandas version
Knowing your installed version can help with compatibility and debugging.
Creating Your First DataFrame
One of the key data structures in pandas is the DataFrame: essentially a two-dimensional table with rows and columns, similar to a spreadsheet, SQL table, or named arrays.
Here’s a simple example:
import pandas as pd
data = {
'Cars': ["BMW", "Volvo", "Ford"],
'Passings': [3, 7, 2]
}
df = pd.DataFrame(data)
print(df)
Output:
Cars Passings
0 BMW 3
1 Volvo 7
2 Ford 2
In this example:
datais a dictionary where keys become column names, and values are lists of data for each column.dfis a DataFrame object containing the tabular data.
Common Operations & Methods
Accessing Columns
You can access columns by their column name:
print(df['Cars'])
This returns a Series containing the "Cars" column.
Accessing Rows
Use the .loc[] accessor for label-based indexing or .iloc[] for integer-position indexing. Example:
# Using .loc (label-based) – here row labels are default integers
print(df.loc[0]) # returns the first row as a Series
Selecting Multiple Rows & Columns
You can combine row and column selection:
print(df.loc[ [0,1], ['Cars', 'Passings'] ])
This gives a DataFrame of rows 0 and 1 and the specified columns.
Checking Version & Metadata
To check which version of pandas you have (helpful for compatibility):
import pandas as pd
print(pd.__version__)
Why These Basics Matter
Getting comfortable with installation, import, creating a DataFrame, and basic selection is foundational. Once you have the structure and syntax down, you’ll be ready to move into more advanced tasks such as reading external files, cleaning messy data, transforming data, summarizing statistics, grouping, and visualizing results.
The official pandas “10 minutes to pandas” guide describes exactly these foundational structures and builds from there.
Next Steps
Now that you have the basics in place, here’s a quick road-map of what you can explore next:
Reading and writing data — CSV files (
pd.read_csv()), Excel (pd.read_excel()), JSON (pd.read_json()), SQL, and more.Handling missing values — Use methods like
.dropna(),.fillna().Filtering and sorting — Boolean indexing (e.g.,
df[df['Passings'] > 5]),.sort_values(),.sort_index().Grouping and aggregating — The powerful split-apply-combine paradigm via
.groupby().Merging and joining — Combining multiple DataFrames with methods like
pd.merge(),pd.concat().Time series and date functionality — If your data has dates/times you can use indexing, resampling, rolling windows.
Visualization — Using pandas built-in methods (
.plot()) or integrating with Matplotlib/Seaborn.
Summary
In this beginner-friendly guide to pandas we covered:
Why pandas matters for data analysis in Python.
How to install and import pandas.
How to create a basic DataFrame from Python dictionaries.
What you should do next to level up your skills.
Whether you are working with small pieces of data or large datasets, mastering pandas will enable you to move from raw data to insight with relative ease.
If you’re ready, dive into one of the next topics (reading CSVs, cleaning data, grouping/aggregating) and start practicing with real datasets. Happy analysing!