Understanding World Bank Data: A Beginner’s Guide to Preparing It for Analysis

We rely on open data portals to gather information for analysis and to write articles for our site. One of our go-to sources is the World Bank. It is one of the most valuable open data sources for global development, economics, and social indicators. Whether you’re exploring GDP growth trends, tracking education rates, or comparing CO₂ emissions, World Bank datasets provide a wealth of information covering nearly every country and region in the world.

However, if you’ve ever downloaded a dataset directly from the World Bank Data Portal, you may have noticed that the file isn’t quite ready for analysis right away. The data usually needs some cleaning and reshaping before you can use it effectively in Google Sheets, Excel, or a programming tool like Python or R.

This guide explains how World Bank data is structured, what kind of information it contains, and how to transform it into a tidy, analysis-ready format.

How World Bank Datasets Are Organized

When you download data from the World Bank, it typically comes as a CSV or Excel (.xls) file. Here’s what you’ll usually find inside:

Column Name Description
Country Name Full name of the country or area (e.g., Canada, World, OECD members).
Country Code ISO3 three-letter code used for joins and merges (e.g., CAN, USA).
Indicator Name Human-readable title for the series (e.g., GDP (current US$), CO₂ emissions (metric tons)).
Indicator Code World Bank code (unique machine identifier) – useful when merging metadata or using the API (example: NY.GDP.MKTP.CD).
Year columns Each year is a separate column (e.g., 2000, 2001). This is the typical wide layout returned by many downloads.
Values Numeric values for each country–year combination. Expect gaps: not every country has data for every year or indicator.

This “wide format” structure is great for viewing data, but not ideal for analysis or visualization. Most analytical tools and workflows prefer a long format, where each row represents a single observation (for example, one country, one indicator, one year, and one value).

Example: The Wide vs. Long Format

Here’s what a typical World Bank dataset looks like when downloaded:

Country Name Country Code Indicator Name Indicator Code 2000 2001 2002 2003
Canada CAN GDP (current US$) NY.GDP.MKTP.CD 742000000000 762000000000 799000000000 823000000000
France FRA GDP (current US$) NY.GDP.MKTP.CD 1320000000000 1360000000000 1400000000000 1440000000000

But for analysis, it’s often better to have your data in this format:

Country Name Country Code Indicator Name Indicator Code Year Value
Canada CAN GDP (current US$) NY.GDP.MKTP.CD 2000 742000000000
Canada CAN GDP (current US$) NY.GDP.MKTP.CD 2001 762000000000
Canada CAN GDP (current US$) NY.GDP.MKTP.CD 2002 799000000000
France FRA GDP (current US$) NY.GDP.MKTP.CD 2000 1320000000000
France FRA GDP (current US$) NY.GDP.MKTP.CD 2001 1360000000000
France FRA GDP (current US$) NY.GDP.MKTP.CD 2002 1400000000000

This is what’s known as tidy data, where each variable has its own column, each observation has its own row, and each value is stored in a single cell.

3. How to Reshape the Data

In Google Sheets or Excel

You can use built-in functions like:

  • =TRANSPOSE() to flip rows and columns.

  • =FILTER() or =QUERY() to extract subsets of data.

  • =TEXTSPLIT() and =CHOOSECOLS() to clean or reorganize columns (especially helpful if you only need a few years or countries).

To reshape from wide to long in Sheets:

  1. Select your dataset.

  2. Use Data → Pivot table → Unpivot (available in Excel’s Power Query or with an add-on in Google Sheets).

  3. Alternatively, use Python’s pandas.melt() or R’s pivot_longer() if you’re comfortable with code.

In Python (pandas)

If you’re using Python, the World Bank format can be easily reshaped using the pandas melt() function:

import pandas as pd

df = pd.read_csv("worldbank_data.csv")

# Transform from wide to long
df_long = df.melt(
    id_vars=["Country Name", "Country Code", "Indicator Name", "Indicator Code"],
    var_name="Year",
    value_name="Value"
)

This turns year columns into rows—perfect for plotting, aggregating, or merging with other data sources.

What Type of Information Is Available

The World Bank’s data catalog contains thousands of indicators organized under major themes such as:

  • Economy & Growth: GDP, inflation, investment rates

  • Education: Literacy, enrollment, expenditure on education

  • Health: Life expectancy, immunization coverage, healthcare spending

  • Environment: CO₂ emissions, forest area, renewable energy use

  • Poverty & Inequality: Income share, poverty headcount ratios

  • Infrastructure & Technology: Internet access, mobile subscriptions, electricity consumption

Most indicators include annual data going back to 1960 or 1970, depending on the series.

Tips for Working with World Bank Data

  • Check for missing values: Not all countries have data for every year or indicator.

  • Look up the metadata: Each indicator has a full description, methodology, and data source on the World Bank website.

  • Standardize country codes: Use ISO3 codes (like “CAN” or “USA”) to merge with other datasets.

  • Filter by regions: You can also download aggregates (e.g., “World”, “OECD”, “Sub-Saharan Africa”) if you don’t need country-level data.

Why Reshaping Matters

Reshaping your data makes it easier to:

  • Visualize trends over time with charts and dashboards.

  • Combine multiple indicators (e.g., GDP vs. CO₂ emissions).

  • Use modern analytical tools that expect data in tidy, long format.

  • Run regressions or forecasts without manual data cleaning every time.

Summary Table

Step Goal Suggested tools / functions
Download Obtain the raw CSV or Excel file (or use the World Bank API for automated pulls). Web: data.worldbank.org • API • CSV / XLSX
Inspect Check header rows, confirm whether the file is wide (years as columns) and find missing-value patterns. Excel / Google Sheets preview • head() in pandas
Reshape (wide → long) Convert year columns into a single Year column and a Value column — this creates tidy data suitable for analysis and plotting. Python: pd.melt() • R: pivot_longer() • Excel: Power Query Unpivot
Clean Handle missing or non-numeric cells, standardize country codes, and coerce year to integer/date types. pandas: dropna(), astype(); Sheets/Excel: filters, IFERROR
Enrich Attach metadata (indicator definitions, units), or merge in regional/group codes for aggregation. Join on Indicator Code or Country Code • Look up World Bank metadata pages
Analyze & Visualize Create time-series charts, compute growth rates, or compare indicators across countries or regions. Sheets charts, Python (matplotlib/plotly), Tableau, Flourish

Quick notes: Use ISO3 country codes when merging datasets; expect incomplete series for smaller countries; consult the World Bank indicator metadata for units and methodology before comparing series.

Takeaway

World Bank datasets offer an incredible foundation for global analysis—but the real insights come once you reshape and clean the data. With a few transformations, you can go from a static spreadsheet to a dynamic dataset ready for visualization, comparison, and storytelling.

Henry Dang

Hey there! I’m Henry, the founder of Fun With Data. I am a communications strategist with an interest in data analytics. I created this website to highlight the importance of data in storytelling and to help others develop the skills and knowledge to become better storytellers with data. Join me on this journey!

Next
Next

How to Find Key Financial Metrics for Canadian-Listed Companies