Section I: Introduction, Configuration and Setup
Section II: Aggregating EIA Data with Python
Section III: Cleaning, Visualization with Pandas, Matplotlib
Section IV: Understanding SVG Files with BeautifulSoup
Section V: Geoplotting State-Level US Data to Create Heatmaps
The EIA website hosts quite a large range of data on energy and the energy industry. From the start, I knew I wanted to find state-level data on renewable energy production and data on residential electricity. However, out of curiosity I created a short code snippet that utilizes the API's keyword search call. The call simply returns metadata on all series of data that are relevant to a provided keyword. Using requests.get() and .text to return the text of the data, what is returned is JSON data in string format. This can easily be converted into JSON format with json.loads(). Once in JSON format, data can be accessed by indexing your way down the response's element tree.
Using search_keywords() returns the names of all relevant series, along with their corresponding series IDs. These IDs are used to compile specific data.
searches = {}
def search_keywords():
query = raw_input('Search for Keyword: ')
if query == 'x' or query == 'q':
pass
else:
base_url = 'http://api.eia.gov/search/?search_term=name&search_value="{}"&rows_per_page={}'
r = requests.get(base_url.format(query, '10000'))
master_dict = json.loads(r.text)
docs = master_dict['response']['docs']
print('Search Results Found:')
for series in docs:
print {series['name']: series['series_id']}
Search for Keyword: renewable energy
Search Results Found:
{'Renewable energy production, Iowa': 'SEDS.REPRB.IA.A'}
{'Renewable energy production, Ohio': 'SEDS.REPRB.OH.A'}
{'Renewable energy production, Utah': 'SEDS.REPRB.UT.A'}
{'Renewable energy production, Idaho': 'SEDS.REPRB.ID.A'}
{'Renewable energy production, Maine': 'SEDS.REPRB.ME.A'}
def search_series(query, save_as):
base_url = 'http://api.eia.gov/search/?search_term=series_id&search_value="{}"&rows_per_page={}'
r = requests.get(base_url.format(query, '10000'))
docs = json.loads(r.text)['response']['docs']
print('Search Results Found:')
for series in docs:
print {series['name']: series['series_id']}
save_search = raw_input('Would you like to save these search results? ')
if 'y' in save_search.lower():
searches[save_as] = docs
series_index = [{x['name'].split(', ')[1]: x['series_id']} for x in searches[save_as]]
df = pd.DataFrame(series_index)
df = pd.DataFrame(df.stack()).reset_index()
df = df.drop('level_0', axis=1)
df_i[save_as] = df.rename(columns={'level_1': 'State', 0: save_as})
factors = [
['SEDS.RETCB', 'Renewable Consumption'],
['SEDS.REPRB', 'Renewable Production'],
['SEDS.TEPRB', 'Total Energy Production'],
['SEDS.ESRCD', 'Price of Residential Electricity'],
]
for factor in factors:
search_series(factor[0], factor[1])
# helper functions
def query_url(url):
r = requests.get(url)
return json.loads(r.text)
def nav_json(j):
return j['series'][0]['data']
def compile_state_data(series):
series_df = pd.DataFrame()
try:
for state_data in df_i[series][series]:
url = 'http://api.eia.gov/series/?series_id={}&api_key={}&out=json'.format(state_data, api_key)
data = nav_json(query_url(url))
df_inst = pd.DataFrame(data, columns=['Year', series])
df_inst['Year'] = [datetime.datetime.strptime(str(x), '%Y').year for x in df_inst['Year']]
df_inst['State'] = ''.join(df_i[series]['State'][df_i[series][series] == state_data])
df_inst = df_inst[['State', 'Year', series]]
series_df = df_inst if len(series_df) == 0 else series_df.append(df_inst, ignore_index=True)
return series_df
except Exception as e:
print('Error:', str(e), 'is not a recognized EIA Series ID.')
Total_Energy = compile_state_data('Total Energy Production')
Renewable_Energy = compile_state_data('Renewable Production')
Res_Electricity_Price = compile_state_data('Price of Residential Electricity')
keys = ['State', 'Year']
df = pd.merge(Total_Energy, Renewable_Energy, on=keys).merge(Res_Electricity_Price, on=keys)
df['Renewable, Percentage of Production'] = df['Renewable Production'] / df['Total Energy Production']
outFile = open('df.txt', 'wb')
pickle.dump(df, outFile)
outFile.close()