Methane Flaring Impact on Vulnerable Communities (Master's in Business Analytics - Final Project)

The problem

A pilot project developed in conjunction with RMI as an interactive tool mapping California’s most harmful methane flaring and the socially vulnerable communities least able to cope with that hazard.

The non-profit climate consultancy RMI had a project in mind to explore and map the social impact of methane flaring, but the project was only in the pilot phase and awaiting additional funding. They wanted to develop a proof-of-concept to demonstrate the issue and engage stakeholders like people in the affected communities and funders.

I completed this as the final project of my Master’s in Business Analytics degree at the University of Montana.

Approach aka the story

The Torrance Refinery sprawls across 700 acres in Torrance, California, nestled in the larger Los Angeles metropolitan area. The facility employs hundreds of people and produces a wide variety of petroleum products: diesel fuel, aviation fuel, and nearly a tenth of California’s total gasoline. It also produces waste gas, mostly methane, that is burned — or flared — on site when selling the gas isn’t economical.

Facilities like the Torrance Refinery have powered unprecedented mobility and high living standards for millions of people. But like so many aspects of fossil fuels, there is a darker side.

About 15,000 people live just over a mile from the refinery, and many more work or go to school in the area. Children go to school at Edison Elementary School and teenagers at North High School. Residents walk their dogs at Columbia Park and employees go to work at Honda’s North American headquarters, a UPS facility, and the Torrance Fire Department, Station 3. 

Every time the Torrance Refinery flares off their waste gas, these people are potentially exposed to dangerous chemicals in the air. Flaring methane is good from a global emissions perspective — the combustion transforms methane into carbon dioxide, which has 80 times less global warming potential than the same amount of methane. But the flared methane is also accompanied by a cocktail of other dangerous chemicals and carcinogens like benzene, toluene, and xylene, and distributed via tiny particles that can harm lungs and other parts of our bodies. Plus, the very idea of flaring as a climate solution is tenuous at best. Recent research on the major gas-producing areas in the United States shows that methane emissions are five times higher than current estimates, due to inefficient combustion or methane gas simply not being combusted at all.  

Particulate matter and toxins released with the flares have been associated with a variety of negative health effects, including pediatric asthma, congenital heart defects, and higher rates of low-birth-weight babies and preterm births. This is not a small-scale problem limited to rural populations clustered around oil & gas development or occasional urban facilities like Torrance Refinery, either: an estimated 18 million Americans live within 1 mile of an oil & gas well.

From a global climate perspective, flaring has been over-promised and under-delivered. And at local scales, it clearly can effect real and lasting damage, often to communities already saddled with other social and environmental burdens. It is a “solution” that doesn’t deliver on its stated intent, while imposing a host of detrimental effects on communities living nearby.

For the final project of my Master’s in Business Analytics at the University of Montana, I partnered with RMI to explore the social impact of methane flaring, using California as a case study. Understanding where the most harmful flares are occurring and where communities near these flares are most vulnerable to the harmful effects is crucial for mobilizing environmental justice and climate action at several scales. Regulators, vulnerable affected communities, and social/environmental justice advocates alike can benefit from this work to advocate for a healthier future.

What is vulnerability in this case? And how do you quantify impact from a flare, anyway?

Existing research makes clear that methane flaring negatively impacts nearby communities, and that it’s not an isolated problem. These flares also form a problem that is not equally distributed among Americans. There is evidence that minority groups are disproportionally affected by flaring, including in Texas and California.

This disproportionate impact is related to the concept of social vulnerability, which, in the EPA’s words, is “a community’s ability to anticipate, cope with, and recover from adverse impacts.” This vulnerability can come from a variety of factors like race, educational background, age, the prevalence of disease in a community, and others. Picture how marginalized groups have less political power to fight injustice, or how the very young and very old in a poor community are less able to respond to stressors like a prolonged heat wave. What’s more, vulnerability depends on the context of a specific adverse impact — a community may not be as vulnerable to flooding, for instance, as they are to air pollution. 

There is no standard approach for quantifying the social and environmental impact from methane flaring. It is related to air pollution more broadly but is its own unique hazard. In consultation with RMI, I determined that the impact of a flare could be broken down into three variables: the total affected population, the social vulnerability of that population related to air pollution, and the volume of flared gas. To account for the dispersed nature of air pollution, I analyzed these impacts using three different buffer zones around each flare — at 100m radius; 1,000; and 2,000m — following the standard approach for modeling in this area. 

For this project I married infrared satellite imagery from the NOAA’s VIIRS spectrometer of methane flaring from 2012 to 2021 with data on population demographics & social vulnerability courtesy of the EPA and the U.S. Census. Before diving into the analysis, I need to explain two important concepts: how I defined social vulnerability, and Census block groups. 

I conceptualized social vulnerability using the EPA’s environmental justice (EJ) index for particulate matter 2.5 (PM2.5), as it related more closely to my interest in social vulnerability related to air quality than any of other pre-existing EJ Indices available from the EPA. Each EJ Index combines demographic information with a single environmental factor, like air pollution, and represents how vulnerable a community is to that hazard in a way that allows for comparing block groups across the United States. For simplicity’s sake, I’ll refer to this measure as the EJ Index from here forward. 

This social data is mapped to the level of Census block groups, which is a geographical unit that generally contains between 600 and 3,000 people (see the image below for an overview of block groups and how the Census divides up the United States).

I answered three research questions in this project in an effort to understand the degree to which methane flaring is affecting vulnerable communities and provide stakeholders involved in every aspect of flaring — from regulators to energy producers to affected communities and advocates — a tool that can aid discussions around better, healthier futures for all involved. 

  1. Which flares are the most impactful? 

  2. Which block groups are being the most impacted?

  3. Is there evidence of a disproportionate impact of flares on minorities? 

Let’s break each of these down, along with many more examples.

Which flares are the most impactful?

There is no standard approach for quantifying impact from flaring, and there’s also no definition of what variables should most influence this idea of “impact”. Should it be based solely on the volume of methane flared? Perhaps the total population affected? And how should social vulnerability be considered, if at all? After all, not all affected populations are created equal from a vulnerability standpoint. To turn this uncertainty into a benefit, I included all three of these variables — methane volume, social vulnerability to air pollution, and affected population — to calculate a comprehensive score that I call Impact Metric for each individual flare.

To visualize these scores, I built an interactive dashboard in Tableau that allows users to weight each of the three variables depending on how they believe impact should be calculated, as well as choose the buffer size around each flare that should be considered for calculating harm. Hover over any flare and you’ll see a variety of helpful information for that site, like its Impact Metric, the county it’s in, the total affected population, the percent of that affected population that are minorities, and the last three years of flaring volume benchmarked against the average for all flares in California.

Intriguingly, the impact scores change widely depending on how the variables are weighted and the buffer size chosen. There are more than 100 flaring sites in the study period, from relatively high-volume flares in Contra Costa County, to rural flares in Kern County, to flares in urban Los Angeles surrounded by large populations and high levels of social vulnerability. 

I also built in the ability to limit the total number of flares shown on the map, so it’s easy for a user to see the top ten or top twenty flares by highest Impact Metric. Last but not least, a user can click on any object on the map to bring up an aerial view courtesy of Google Maps — perfect for visualizing a production facility or the surrounding area. All together, these features give users the ability to change the relative influence of the variables that comprise the Impact Scores as well as the area deemed harmful around each flare, helping to make the point that impact is a subjective concept.

At first glance, this seems more hazy than helpful — if the rankings are so unstable, what value do they provide? But at a deeper level, this variability is valuable evidence for RMI and other stakeholders involved with methane flaring. It shows that any estimation of environmental and social impact from flaring needs to be built on a sophisticated understanding of the variables that go into calculating impact. If the rankings changed only slightly with these user choices, then perhaps less sophisticated modeling like this pilot project would suffice. But this is not a situation where there is an objective smoking gun to identify the “worst” flares. 

Buffer size clearly matters, which lends evidence towards the value of integrating meteorological data and other on-the-ground sensor data to identify more precisely where the air pollution from a flare is traveling. Likewise the relative weights of affected population, social vulnerability, and flare volume all have influence. This provides evidence suggesting the value of building in more viewpoints to this work; tapping the expertise of industry professionals, government regulators, affected communities, and social/environmental justice advocates can help RMI identify the variables and most realistic relative weights for each that should make up a composite impact score. Since the ranking of which flares are most impactful depends heavily on how that impact is calculated, it’s all the more imperative that the right voices have a say in determining that calculation. In all, this deeper work could not only more accurately model the social impact of flaring, but provide stakeholders of all kinds with the specific data and tools necessary to advocate for a healthier future.

Block Groups Dashboard

I created a second similar dashboard to the one mentioned above that focuses on visualizing which block groups are being the most impacted. Whereas the first dashboard attempts to identify the flares causing the most harm, this dashboard is useful for understanding the block groups suffering the most harm. This additional level of detail is helpful for visualizing differential impacts within the impact area of specific flares, as there are often wide differences in population and social vulnerability in the block groups affected by a flare. 

The setup and functionality for this dashboard are the same as the Flares Dashboard, except I’ve aggregated the relevant variables with block groups as the unit of analysis rather than the flare buffers.

Similar to the Flares Dashboard, allowing the user to control the parameters that define the Block Group Impact Metric helps convey the subjectivity in defining impact. This dashboard could be especially valuable for affected communities and social/environmental justice advocates who want to better understand flaring impacts on specific communities.

Analysis of Disproportionate Impact

In addition to the interactive dashboards, I employed statistical analysis to investigate the impact of methane flaring on BIPOC communities relative to white communities. Through this analysis, I explored whether there is evidence of a disproportionate impact on these minority communities. Put simply, are minorities more likely to live within a certain buffer zone of a flare than whites? To test this, I first filtered the overall California population down to just people living in block groups that either directly contained a flare or are situated adjacent to those intersected block groups. This sharpened the analysis by excluding the large proportion of California — both in terms of area and population — unaffected by flaring. 

Then, for a variety of flare buffer sizes I calculated a weighted proportion of minorities living within the buffer zone relative to the proportion of minorities living outside the zone; see the figure below.

For instance, at a buffer radius of 2,000m, the weighted proportion of minorities living inside that impact zone is 0.78, or 78% of the total study population. Outside that buffer zone, the weighted proportion is 0.70, or 70% of the population. For this time period, we can say 78% of the population living within 2,000m of a flare are minorities, while only 70% of those living farther than 2,000m from flares are minorities. This leads to the logical next question — is this 8% difference actually meaningful? 

I answered this question with a statistical technique called permutation testing. In a nutshell, permutation testing uses simulations to build a fake world under some assumption that you’d like. You can then compare the results of the simulated world (and its specific assumptions about your data) to your actual results. How unlikely the actual results are compared to the simulated results gives you valuable information about whether the actual data behaves in line with your assumption…or not. 

I’ll illustrate with an example. In this case, I set up the simulations to calculate the in-buffer and out-buffer weighted proportions of minorities under the specific assumption that the proportion of a block group that intersected with a given flare buffer has no bearing whatsoever on the proportion of minority individuals living in the block group. That is, I wanted to simulate what the difference of in-buffer to out-buffer proportions would look like if I assumed that block groups with higher concentrations of minorities were no more likely to be near flares than block groups with higher white populations. 

Here is an example of the results for the buffer size of 2,000m.

The red line indicates the actual difference in proportions I found: 8.1%. The grey bell-shaped distribution are the differences for each of the thousands of permutations I ran, connected into a line. We see that the vast majority of differences in the proportions under the assumption that there is no association between flaring sites and the minority makeup of a block group lie between -5% and 5%. In actuality, only one of the simulations found a more extreme difference than our actual difference of 8.1%. This corresponds to a p-value, for those interested, of 0.0005, indicating that it would be extremely unlikely to see the actual difference I did under the assumption that there’s no association between flaring sites and the minority makeup of a block group. So, this is evidence of disproportionate impact at the 2,000m buffer radius, but what about at other distances from flares? Does the relationship hold for people living closer or farther away? 

I tested the nature of this relationship by applying the permutation test structure to a variety of buffer sizes, from as close as a 100m radius out to 5,000m. These results are captured in the plot below.

This is a cousin of the first permutation chart, modified to make the results easier to digest when multiple buffer sizes are plotted side-by-side. Each dot represents the actual difference of in-buffer to out-buffer minority proportions for a given buffer size. The horizontal line stretching to either side of the dot indicates the 90% confidence interval for that buffer size — given the permutation results, we can say with 90% confidence that the actual result should fall within the bounds of that line.

The picture that emerges is that at buffer sizes below 1,500m the sample size in California is not large enough to support conclusive evidence of disproportionate impact towards either whites or minorities. At these sizes, the confidence intervals stretch across 0, indicating that we cannot be certain which groups are being impacted more. We do see these confidence intervals getting smaller as the buffer size increases, which reflects the larger sample sizes; as the buffer size increases more total people are included in the analysis and the estimation of the direction and magnitude of impact becomes more precise. 

At the 2,000m buffer and above, we can say with confidence that minorities are being disproportionally impacted by flares relative to whites. It’s important to remember, however, that this analysis doesn’t say anything about the specific air quality impacts experienced at those distances, or that there is any causal link between higher-minority communities and flare sites. It only tells us that minorities living between 2,000 and 5,000m from a flare are disproportionally impacted by the presence of flaring sites, relative to whites, and that we do not have enough data to say which types of communities are most affected at buffer sizes below 2,000m. 

Researchers generally agree that beyond 5,000m the harmful particles and chemicals released with a flare will disperse to the baseline levels of the surrounding area. But since some of these chemicals are known carcinogens, there are no accepted safe levels; any amount can potentially be harmful. Given that, these results could be valuable for affected communities and social/environmental justice advocates lobbying for transitions to renewable energy and other less-harmful forms of energy production.

Implications and Future Work

This project provides valuable contributions to environmental justice discussions of methane flaring in California. At the regulatory and operator level, the Flares Dashboard provides a framework for prioritizing remediation efforts targeted at individual flaring sites. The Block Group Dashboard is also helpful for prioritizing remediation but goes a step further in visualizing the varying impacts among the affected block groups surrounding each flare site. For both dashboards, allowing users the ability to control an estimation of impact area using the buffer size and allowing users to control the relative weights of the three variables that make the comprehensive Impact Metric provides compelling evidence for RMI and other organizations to invest more deeply in this work. 

This could take the form of bringing in additional external stakeholders to identify influential social factors that determine impact, such as the role of income, race, age, and prevalence of other conditions like asthma that make air pollution even more difficult to bear. The PM2.5 Environmental Justice Index I used from the EPA takes some of this into account, but perhaps there are other factors that should be included in the estimation of social vulnerability; affected communities and other climate leaders can help determine these. This stakeholder engagement can also help identify the most valuable data products and tools to come out of the project, i.e. what types of data and tools do these communities need to most effectively advocate for themselves?

 Further work could also involve more sophisticated modeling techniques like integrating meteorological data, or through expanding functionality by integrating economic measures to develop an index of flare mitigation potential from an economic perspective. 

The analysis of disproportionate impact provides valuable evidence for affected communities and social/environmental justice advocates that block groups with higher concentrations of minorities are being disproportionally affected by flaring at a buffer size of 2,000m radius or larger. While the potential harm from a flare dissipates as you get farther away from the site, 2,000m is still well within a potentially dangerous range from these dangerous chemicals and particulates. Additionally, this analysis could be bolstered with data from other areas impacted by flaring. Adding in data from flaring in Texas or North Dakota, for instance, could provide the statistical power to show evidence of disproportionate impact on minorities for the combined areas. 

Personally, I’m excited by the value of harnessing data science for environmental justice. As a capstone project to my master’s degree, I could not have asked for more interesting or meaningful work. There are also plenty of areas to improve on the project, in addition to the directions just mentioned. I’d love to add a time-based component to the impact calculations, allowing users to explore comprehensive impact estimates for a range of time, or slice down to yearly estimates. Another valuable future iteration would be building the work on top of API connections to the satellite imagery of flaring data and the combination of EPA and U.S. Census on social demographics and vulnerability. This would improve the shelf life of the tool and provide the foundation to connecting this work to RMI’s existing browser-based visualization tool OCI+

I hope you find this as interesting and valuable as I do. If you have any feedback or suggestions for improvement, I’m all ears. You can see the work on the Tableau Public platform at the link below.

Skills

  • Programmatic data gathering and cleaning in Python

  • Statistical analysis in R using permutation testing

  • Spatial analysis in Python using geopandas

  • Creating an interactive dashboard in Tableau

Tools

  • R Studio

  • Python

  • Tableau

  • Canva

Results

I did not get to be involved in the rest of the project after this pilot phase, unfortunately, but I did hear from the folks at RMI that this work was valuable for later phases of the project. RMI did launch the final public-facing version of this project in spring 2024 as the Flaring-EJ Risk Tool, which can be found here.

Previous
Previous

Energy Insights — a building energy use & forecasting app

Next
Next

Missoula Heat Risk Index