Return to site

My First Hackathon: Tackling Clean Energy in Myanmar

Note: this is part 1 of a series of 2 articles I did about the hackathon. The second articles lives here.

A few weekends ago I went to my first hackathon, the Clean Energy Data Science Challenge organized by the State Department and Booz Allen Hamilton. It was hosted at the Galvanize campus in San Francisco. The hackathon asked participants to use data to map out where people in Myanmar don't have access to electricity and where the best places would be to develop minigrids, which are small local grids not connected to the main grid.

In this post, I'll talk about how my team approached the problem and the solution we came up with for the hackathon. For more about my impressions of the hackathon and what lessons I learned for the future, check out this post.

The challenge

Given that the challenge was to identify places where solar minigrids would have the greatest impact and likelihood of success, I assumed there would be some data about previous attempts to install minigrids in Myanmar or similar countries, with information on the characteristics of the attempt (location, demographics, installation size, etc) and whether the attempt succeeded or not. With this kind of data, you could try different types models and attempt to figure out what the characteristics of the "ideal" town for minigrid system would be and then work backward to identify locations in Myanmar meeting these criteria.

It turned out that no such information was available to us. In fact, after talking with some of the domain experts that were on call during the hackathon, it seems that that kind of data may not even exist. That meant that we would have to rely on assumptions about what the ideal characteristics of such locations might be, based largely on the advice of the domain experts. I was a bit disappointed, because this meant that I wouldn't have a chance to put some of my new modeling skills to work, but of course in many real world situations you don't have labeled training data to work with, so it was a valuable challenge nonetheless.

broken image

Our approach

Without this data, we had to make a clever plan to map out prime locations for minigrids. Instead of starting from scratch and considering every township in the country, we focused on a list put together by the World Bank that identified about 200 villages in Myanmar as good candidates for minigrid systems (although not necessarily solar-powered ones). These villages had mostly been chosen because they were in remote locations where it would be difficult or expensive for the developing national grid to reach. This allowed us to narrow down the field of possible locations considerably, and it was a great find among the piles of data provided by the hackathon organizers.

After talking with several domain experts, including representatives from the World Bank and a couple of people from companies that installed solar minigrid systems in developing countries, we settled on three basic categories of criteria for identifying promising solar minigrid locations:

  • Amount of yearly solar irradiation

  • Ability of local residents to pay for minigrid costs

  • Demand for electricity

For each of these criteria, scoring higher would mean a location is more likely to be a successful candidate for a solar minigrid. I focused chiefly on the solar irradiation criteria because the most useful data we wanted - the yearly output in kilowatt-hours that a solar panel would provide in various locations - was most easily accessible through an API, and I had experience getting data from API's using Python. I wasn't deeply involved in how we measured the other two criteria, so I'll mainly focus on the solar irradiation aspect that I worked on and then talk briefly about the other criteria.

While it might seem straightforward to measure compared to the other criteria, the solar irradiation question presented a few challenges. Simply looking at the average yearly kilowatts per square meter would give us a good way to compare different locations on a relative scale and that data was available in the datasets provided to us. However we wanted to be able to come up with more precise estimates of how many kilowatt-hours of electricity could be generated with current photovoltaic technology at each of our candidate locations. Luckily one of my teammates knew exactly how we could get that data. The National Renewable Energy Laboratory has developed a tool called PVWatts where you can input latitude and longitude coordinates along with information about the type of solar installation you are installing and then output the yearly kilowatt-hours that the installation would produce. They also have an API and someone was nice enough to create a Python wrapper for the API called pypvwatts.

After discussing with the environmental planning graduates on my team, we decided on some reasonable assumptions for what the solar installation's module type, percent loss, tilt and azimuth would be. I then put together a Python script that obtained estimates of the yearly AC kWh output that a 1 kW system could produce in each of our candidate locations.

These numbers allowed us to make some ballpark estimates on what the monthly cost would be for a system that met the minimal household level of demand in Myanmar, which we found to be about 350 kWh. Our calculations showed that the monthly cost per household, assuming solar panels that last 20 years and batteries that last 3 years, would range from $5.16 to $5.70. This estimate seemed to be in the range of what the companies at the hackathon were charging, so it served as a good sanity check for our calculations.

Measuring local residents' ability to pay was another kind of challenge because virtually no data about income or spending in Myanmar was available. We decided to use data collected from the 2014 census about motorbike, TV, and cell phone ownership as proxies for income. We came up with a formula to weight these ownership levels, with cell phone ownership being the most important because it not only indicated some amount of income but also because having a cell phone meant that mobile payments could potentially be used.

Modeling demand was even tougher. We figured the most basic way to estimate demand was by population, but township-level population growth figures weren't available, meaning we could only apply a national growth rate to each township. Without this data, we had to settle for an extremely basic model for demand that was based on a study we found showing how electricity usage grew in a typical Myanmar household after it was connected to the grid. We essentially took the current township populations and projected what demand would look like given the national rate of population growth over the next 6 years. We definitely were not satisfied with this because all it really did was prioritize townships that already had larger populations, but we had to settle for it as we ran out of time.

Putting it all together

Having come up with at least some basic ways to evaluate different townships on the criteria of solar irradiation levels, ability to pay and demand, we had to figure out how to put all of it together to come up with a ranking of candidate locations. One of the biggest challenges I faced as the data wrangler was how to join together the different sources of data we were using. We had all been working separately on putting together the data for each of the three criteria, and I soon realized we had a big problem.

Some of the data was on the township level but some was on a regional or state level. In addition, the English names of the different townships did not have a universal spelling, since they had been transliterated from Burmese. I decided to use postal codes as unique identifiers for townships that I could use to join all the data on, but some sources of data we had used only specified the township name and not the postal code. This unfortunately meant we had to have several team members look through the list of township names that did not exactly match from one data source to the next and use some detective work to track down the proper postal codes. Once we had that, I created a dictionary mapping town spellings to postal codes in Python and was able to use that to join together the different data sources.

With a cleaned dataset and values for most of our candidate townships, we had to decide how to weight the different criteria we selected. What would be most important to a developer deciding on where to install a solar minigrid: the amount of solar irradiation, residents' ability to pay or total demand? Without any data to determine what factors were most important, we decided that developers themselves would be better suited to choosing weights for each criteria. Our system would allow developers to input their choices and see the resulting rankings. For the sake of having a demo ready to present, we chose weights that were about equal between the three criteria.

Prototype of our mapping solution showing highest priority areas for solar minigrids in Myanmar

To get a nice visual and interactive map, we used Carto and uploaded the joined data. A screenshot is shown above, and you can check out the prototype version here.

The map that we created would really only be the first step in tackling the problem. In our presentation at the end, we talked about how we would like to get a better sense of what the weights should be on the different factors by gathering evidence from minigrid attempts in a similar country that is more developed, such as Vietnam. In future iterations of our prototype, we would like to include the ability to see more detailed data breakdowns for each town in the country. This was a feature that the winning team had in their final product that really differentiated them from the rest of the competitors.

While our final product was not selected as a winner, I was proud of the way my team came together with different areas of expertise and attacked the problem in a systematic way. If you're interested in the problem, Booz Allen Hamilton is sponsoring a HeroX competition that builds on the work of the hackathon. For more on my overall impressions of the event and lessons learned from the hackathon, check out this post.