Getting Started With Python Pandas Library

PythonPandas

Nov 5

If you’re working with data in Python, the library pandas is one of the most powerful tools you can learn. It provides high-level data structures and tools designed to make data manipulation, cleaning, and analysis simple and fast. In this guide you’ll get a practical introduction to pandas — how to install it, import it, create your first DataFrame, and perform some basic operations.

Why Use pandas?

Here are several reasons to learn pandas when working with data in Python:

pandas supports two primary data structures: the Series (1-D) and the DataFrame (2-D), which make working with tabular or structured data intuitive.
It simplifies tasks like reading data from CSV/Excel/JSON, cleaning missing values, filtering and transforming data, summarizing statistics, and much more.
It integrates well with other popular Python libraries such as NumPy, Matplotlib, and scikit‑learn, making it a core piece of the “PyData” stack.
For learners and practitioners, understanding pandas is an essential step to move from raw data to insight.

Installing pandas

Before you begin, you need to have Python installed (preferably Python 3.x). You can install pandas using the pip command as follows:

pip install pandas

If you are using a distribution such as Anaconda (which often comes with pandas pre-installed), you may skip this step.

Importing pandas

In your Python script or Jupyter Notebook, you typically import pandas like this:

import pandas as pd

Using pd as an alias is a widespread convention and makes your code more concise and readable. Example:

import pandas as pd
print(pd.__version__)    # Check pandas version

Knowing your installed version can help with compatibility and debugging.

Creating Your First DataFrame

One of the key data structures in pandas is the DataFrame: essentially a two-dimensional table with rows and columns, similar to a spreadsheet, SQL table, or named arrays.

Here’s a simple example:

import pandas as pd

data = {
    'Cars': ["BMW", "Volvo", "Ford"],
    'Passings': [3, 7, 2]
}

df = pd.DataFrame(data)
print(df)

Output:

    Cars  Passings
0   BMW         3
1  Volvo        7
2   Ford        2

In this example:

data is a dictionary where keys become column names, and values are lists of data for each column.
df is a DataFrame object containing the tabular data.

Common Operations & Methods

Accessing Columns

You can access columns by their column name:

print(df['Cars'])

This returns a Series containing the "Cars" column.

Accessing Rows

Use the .loc[] accessor for label-based indexing or .iloc[] for integer-position indexing. Example:

# Using .loc (label-based) – here row labels are default integers
print(df.loc[0])   # returns the first row as a Series

Selecting Multiple Rows & Columns

You can combine row and column selection:

print(df.loc[ [0,1], ['Cars', 'Passings'] ])

This gives a DataFrame of rows 0 and 1 and the specified columns.

Checking Version & Metadata

To check which version of pandas you have (helpful for compatibility):

import pandas as pd
print(pd.__version__)

Why These Basics Matter

Getting comfortable with installation, import, creating a DataFrame, and basic selection is foundational. Once you have the structure and syntax down, you’ll be ready to move into more advanced tasks such as reading external files, cleaning messy data, transforming data, summarizing statistics, grouping, and visualizing results.

The official pandas “10 minutes to pandas” guide describes exactly these foundational structures and builds from there.

Next Steps

Now that you have the basics in place, here’s a quick road-map of what you can explore next:

Reading and writing data — CSV files (pd.read_csv()), Excel (pd.read_excel()), JSON (pd.read_json()), SQL, and more.
Handling missing values — Use methods like .dropna(), .fillna().
Filtering and sorting — Boolean indexing (e.g., df[df['Passings'] > 5]), .sort_values(), .sort_index().
Grouping and aggregating — The powerful split-apply-combine paradigm via .groupby().
Merging and joining — Combining multiple DataFrames with methods like pd.merge(), pd.concat().
Time series and date functionality — If your data has dates/times you can use indexing, resampling, rolling windows.
Visualization — Using pandas built-in methods (.plot()) or integrating with Matplotlib/Seaborn.

Summary

In this beginner-friendly guide to pandas we covered:

Why pandas matters for data analysis in Python.
How to install and import pandas.
How to create a basic DataFrame from Python dictionaries.
How to access and select data (rows/columns).
What you should do next to level up your skills.

Whether you are working with small pieces of data or large datasets, mastering pandas will enable you to move from raw data to insight with relative ease.

If you’re ready, dive into one of the next topics (reading CSVs, cleaning data, grouping/aggregating) and start practicing with real datasets. Happy analysing!

Henry Dang

Getting Started With Python Pandas Library

Why Use pandas?

Installing pandas

Importing pandas

Creating Your First DataFrame

Common Operations & Methods

Accessing Columns

Accessing Rows

Selecting Multiple Rows & Columns

Checking Version & Metadata

Why These Basics Matter

Next Steps

Summary

About Us

Support our Work

Getting Started With Python Pandas Library

Why Use pandas?

Installing pandas

Importing pandas

Creating Your First DataFrame

Common Operations & Methods

Accessing Columns

Accessing Rows

Selecting Multiple Rows & Columns

Checking Version & Metadata

Why These Basics Matter

Next Steps

Summary

How to transform wide tables to long tables using Pandas

About Us

Support our Work