How to use the US Bureau of Labor Statistics API
If you’re interested in economic data for the United States, there’s really no better source than the Bureau of Labor Statistics. BLS provides a vast repository of labor market information for the US and its territories, most of it to the county or city level. However, gaining access to the specific data you want requires clicking through various sites and using the BLS database query tools. If you can do some basic programming, there’s a more direct route. BLS provides public and private APIs (application programming interfaces) that allow programmers to retrieve any published data in JSON format. In order to get started, you’ll want to register a public API data account. Registering allows you to access version 2.0 of the API, which provides access to additional data more frequently (up to 500 queries a day). There is additional functionality such as the ability to add calculations and annual averages to requests, along with series descriptions.
BLS data are available for inflation & prices (cost of living, etc.), employment (and employment projections), unemployment, pay and benefits, spending and time use, productivity, workplace injuries, occupational requirements, regional resources, international import/export price indexes, along with a large amount of historical data. Finding the specific table — referred to by their series ID — which can be tricky. Series IDs follow a consistent format, but you need to know the series ID for the data you want in order to request data using the BLS Public Data API. Unfortunately, there is no central repository of series IDs, but all BLS series IDs follow a similar format. For example, the series ID for national employment, hours and earnings is CEU0800000003.
It’s hard to know where to start with APIs, even if you know which series you’re interested in. Writing an API request and then figuring out how to untangle the output can be frustrating. Fortunately for newbies like myself, BLS provides sample code for making API requests in C#, Java, PHP, Python, R, Ruby/Ruby on Rails, SAS, Unix Command Line, Matlab and Julia. Since I’m interested in Python, I’ve selected the Python sample code as a template for this article:
The sample code provided by BLS will retrieve data from the Inflation & Prices average price (CUUR0000SA0) and chained CPI (SUUR0000SA0) data series for the years 2011 through 2014. This code writes the data to text files, named according to the series they’re derived from. You can simply replace the values for series ID and years in the sample code if you like, but you’ll eventually find that limits your options a bit.
It’s better to locate the series you’re interested in and import it into Pandas. In order to find the series name and fields, refer to the Series ID Formats page. Once you know the series ID for the data you would like to request, you can simply append the ID to the version one API URL, https://api.bls.gov/publicAPI/v1/timeseries/data/, and use the requests Python package to return the data in dictionary or list format:
This will (hopefully) return Status: REQUEST_SUCCEEDED
, indicating that the data has been successfully retrieved. You can proceed to create a Pandas dataframe with your data, but first you’ll need to know what the data looks like or what the keys are. Printing out the keys print(data.keys())
will provide you with an array listing out the keys, which should look something like this: [u'status', u'message', u'Results', u'responseTime']
.
The results key contains a series list, with Results[0] containing the relevant data in the form of dictionaries. If you print Results[0], you will see something like this: {u'footnotes': [{}], u'periodName': u'August', u'period': u'M08', u'value': u'245.519', u'year': u'2017'}
. These are the available fields in this BLS series. You can use Pandas to read this data in as a dataframe:
The data that is returned is not yet formatted for analysis, but we can quickly get a look at what it looks like:
footnotes period periodName value year
0 [{}] M08 August 245.519 2017
1 [{}] M07 July 244.786 2017
2 [{}] M06 June 244.955 2017
3 [{}] M05 May 244.733 2017
4 [{}] M04 April 244.524 2017
As you can see, once you know which series you’d like to retrieve, Python makes it fairly simple to read in the JSON data and convert it into a Pandas dataframe in only about five or six lines of code.
Using the Python BLS Library
There is another option, though I’ll admit I haven’t spent much time with it. The new Python library for the Bureau of Labor Statistics API is still being developed, but it looks promising. In order to retrieve data, use the get_series() function, which has three arguments: a series ID (or multiple IDs), a starting year and an end year. Here’s a code snippet using the Inflation and Price series again:
This will return a Python dictionary without headers:
2005-01-01 190.7
2005-02-01 191.8
2005-03-01 193.3
2005-04-01 194.6
2005-05-01 194.4
Name: CUUR0000SA0, dtype: float64
According to the developer, this module doesn’t yet support more than ten years of data. Unless you specify a year argument, the get_series() function will return the most recent ten years of available data. I’m looking forward to seeing how this library grows and becomes refined. It looks like it could be incredibly useful.
Useful Links
Sample Python Code for BLS API
BLS Public Data API Signatures
Originally published at danstrong.tech.