Understanding World Bank Data: A Beginner’s Guide to Preparing It for Analysis

Nov 5

We rely on open data portals to gather information for analysis and to write articles for our site. One of our go-to sources is the World Bank. It is one of the most valuable open data sources for global development, economics, and social indicators. Whether you’re exploring GDP growth trends, tracking education rates, or comparing CO₂ emissions, World Bank datasets provide a wealth of information covering nearly every country and region in the world.

However, if you’ve ever downloaded a dataset directly from the World Bank Data Portal, you may have noticed that the file isn’t quite ready for analysis right away. The data usually needs some cleaning and reshaping before you can use it effectively in Google Sheets, Excel, or a programming tool like Python or R.

This guide explains how World Bank data is structured, what kind of information it contains, and how to transform it into a tidy, analysis-ready format.

How World Bank Datasets Are Organized

When you download data from the World Bank, it typically comes as a CSV or Excel (.xls) file. Here’s what you’ll usually find inside:

  
      Column Name
      Description
    
      Country Name
      Full name of the country or area (e.g., Canada, World, OECD members).
    
      Country Code
      ISO3 three-letter code used for joins and merges (e.g., CAN, USA).
    
      Indicator Name
      Human-readable title for the series (e.g., GDP (current US$), CO₂ emissions (metric tons)).
    
      Indicator Code
      World Bank code (unique machine identifier) – useful when merging metadata or using the API (example: NY.GDP.MKTP.CD).
    
      Year columns
      Each year is a separate column (e.g., 2000, 2001). This is the typical wide layout returned by many downloads.
    
      Values
      Numeric values for each country–year combination. Expect gaps: not every country has data for every year or indicator.

This “wide format” structure is great for viewing data, but not ideal for analysis or visualization. Most analytical tools and workflows prefer a long format, where each row represents a single observation (for example, one country, one indicator, one year, and one value).

Example: The Wide vs. Long Format

Here’s what a typical World Bank dataset looks like when downloaded:

  
    

  
      Country Name
      Country Code
      Indicator Name
      Indicator Code
      2000
      2001
      2002
      2003
    

  
      Canada
      CAN
      GDP (current US$)
      NY.GDP.MKTP.CD
      742000000000
      762000000000
      799000000000
      823000000000
    

      France
      FRA
      GDP (current US$)
      NY.GDP.MKTP.CD
      1320000000000
      1360000000000
      1400000000000
      1440000000000
    





  

But for analysis, it’s often better to have your data in this format:

  
    

  
      Country Name
      Country Code
      Indicator Name
      Indicator Code
      Year
      Value
    

  
      Canada
      CAN
      GDP (current US$)
      NY.GDP.MKTP.CD
      2000
      742000000000
    

      Canada
      CAN
      GDP (current US$)
      NY.GDP.MKTP.CD
      2001
      762000000000
    

      Canada
      CAN
      GDP (current US$)
      NY.GDP.MKTP.CD
      2002
      799000000000
    

      France
      FRA
      GDP (current US$)
      NY.GDP.MKTP.CD
      2000
      1320000000000
    

      France
      FRA
      GDP (current US$)
      NY.GDP.MKTP.CD
      2001
      1360000000000
    

      France
      FRA
      GDP (current US$)
      NY.GDP.MKTP.CD
      2002
      1400000000000
    


  

This is what’s known as tidy data, where each variable has its own column, each observation has its own row, and each value is stored in a single cell.

3. How to Reshape the Data

In Google Sheets or Excel

You can use built-in functions like:

=TRANSPOSE() to flip rows and columns.
=FILTER() or =QUERY() to extract subsets of data.
=TEXTSPLIT() and =CHOOSECOLS() to clean or reorganize columns (especially helpful if you only need a few years or countries).

To reshape from wide to long in Sheets:

Select your dataset.
Use Data → Pivot table → Unpivot (available in Excel’s Power Query or with an add-on in Google Sheets).
Alternatively, use Python’s pandas.melt() or R’s pivot_longer() if you’re comfortable with code.

In Python (pandas)

If you’re using Python, the World Bank format can be easily reshaped using the pandas melt() function:

import pandas as pd

df = pd.read_csv("worldbank_data.csv")

# Transform from wide to long
df_long = df.melt(
    id_vars=["Country Name", "Country Code", "Indicator Name", "Indicator Code"],
    var_name="Year",
    value_name="Value"
)

This turns year columns into rows—perfect for plotting, aggregating, or merging with other data sources.

What Type of Information Is Available

The World Bank’s data catalog contains thousands of indicators organized under major themes such as:

Economy & Growth: GDP, inflation, investment rates
Education: Literacy, enrollment, expenditure on education
Health: Life expectancy, immunization coverage, healthcare spending
Environment: CO₂ emissions, forest area, renewable energy use
Poverty & Inequality: Income share, poverty headcount ratios
Infrastructure & Technology: Internet access, mobile subscriptions, electricity consumption

Most indicators include annual data going back to 1960 or 1970, depending on the series.

Tips for Working with World Bank Data

Check for missing values: Not all countries have data for every year or indicator.
Look up the metadata: Each indicator has a full description, methodology, and data source on the World Bank website.
Standardize country codes: Use ISO3 codes (like “CAN” or “USA”) to merge with other datasets.
Filter by regions: You can also download aggregates (e.g., “World”, “OECD”, “Sub-Saharan Africa”) if you don’t need country-level data.

Why Reshaping Matters

Reshaping your data makes it easier to:

Visualize trends over time with charts and dashboards.
Combine multiple indicators (e.g., GDP vs. CO₂ emissions).
Use modern analytical tools that expect data in tidy, long format.
Run regressions or forecasts without manual data cleaning every time.

Summary Table

Step	Goal	Suggested tools / functions
Download	Obtain the raw CSV or Excel file (or use the World Bank API for automated pulls).	Web: data.worldbank.org • API • CSV / XLSX
Inspect	Check header rows, confirm whether the file is wide (years as columns) and find missing-value patterns.	Excel / Google Sheets preview • head() in pandas
Reshape (wide → long)	Convert year columns into a single Year column and a Value column — this creates tidy data suitable for analysis and plotting.	Python: `pd.melt()` • R: `pivot_longer()` • Excel: Power Query Unpivot
Clean	Handle missing or non-numeric cells, standardize country codes, and coerce year to integer/date types.	pandas: `dropna()`, `astype()`; Sheets/Excel: filters, IFERROR
Enrich	Attach metadata (indicator definitions, units), or merge in regional/group codes for aggregation.	Join on Indicator Code or Country Code • Look up World Bank metadata pages
Analyze & Visualize	Create time-series charts, compute growth rates, or compare indicators across countries or regions.	Sheets charts, Python (matplotlib/plotly), Tableau, Flourish

Quick notes: Use ISO3 country codes when merging datasets; expect incomplete series for smaller countries; consult the World Bank indicator metadata for units and methodology before comparing series.

Takeaway

World Bank datasets offer an incredible foundation for global analysis—but the real insights come once you reshape and clean the data. With a few transformations, you can go from a static spreadsheet to a dynamic dataset ready for visualization, comparison, and storytelling.

Henry Dang

Hey there! I’m Henry, the founder of Fun With Data. I am a communications strategist with an interest in data analytics. I created this website to highlight the importance of data in storytelling and to help others develop the skills and knowledge to become better storytellers with data. Join me on this journey!

Column Name	Description
Country Name	Full name of the country or area (e.g., Canada, World, OECD members).
Country Code	ISO3 three-letter code used for joins and merges (e.g., CAN, USA).
Indicator Name	Human-readable title for the series (e.g., GDP (current US$), CO₂ emissions (metric tons)).
Indicator Code	World Bank code (unique machine identifier) – useful when merging metadata or using the API (example: NY.GDP.MKTP.CD).
Year columns	Each year is a separate column (e.g., 2000, 2001). This is the typical wide layout returned by many downloads.
Values	Numeric values for each country–year combination. Expect gaps: not every country has data for every year or indicator.

Understanding World Bank Data: A Beginner’s Guide to Preparing It for Analysis

How World Bank Datasets Are Organized

Example: The Wide vs. Long Format

3. How to Reshape the Data

In Google Sheets or Excel

In Python (pandas)

What Type of Information Is Available

Tips for Working with World Bank Data

Why Reshaping Matters

Summary Table

Takeaway

About Us

Support our Work

Understanding World Bank Data: A Beginner’s Guide to Preparing It for Analysis

How World Bank Datasets Are Organized

Example: The Wide vs. Long Format

3. How to Reshape the Data

In Google Sheets or Excel

In Python (pandas)

What Type of Information Is Available

Tips for Working with World Bank Data

Why Reshaping Matters

Summary Table

Takeaway

How to Find Key Financial Metrics for Canadian-Listed Companies

About Us

Support our Work