From Oil to Undervaluation: Wage and Employment Trends in Houston

Houston is often seen as the oil capital of America, but its job market tells a more complicated story. This analysis shows that while energy jobs are shrinking, lower-paying service sector jobs are growing—and in some cases, dominating the local economy. Many of Houston’s most common occupations, like fast food workers and cashiers, earn less than the statewide average, even as they support the city’s daily life. Meanwhile, some of the highest-paid roles—like podiatrists and internal medicine physicians—are vanishingly rare.

This imbalance raises a key question: Is Houston becoming a city of low-wage, high-volume work?

My analysis focuses on:

Which sectors are growing or shrinking in Houston?
Using monthly data from 2015 to 2024, I calculate annual average employment and show which sectors are driving growth.
Is Houston moving away from oil?
By comparing trends in oil-linked sectors with service-oriented industries, I assess whether the city is undergoing a structural shift.
Which occupations in Houston are underpaid/Overpaid compared to the Texas average?
I identify jobs where workers in Houston earn less than their statewide counterparts—and how widespread those roles are.
Which low-wage jobs employ the most people in Houston?
Some of the city’s most common jobs, such as food service or customer support, also offer wages below the state average.
Are there high-paying jobs that are rare in Houston?
I rank the most well-compensated roles that have surprisingly low local employment.

The Data

1-Texas_OES_Report: Based on the U.S. Bureau of Labor Statistics’ May 2024 Occupational Employment and Wage Statistics. I used statewide data across all sectors. Source: BLS OES Research Estimates.

2-Houston_OES_Report: Derived from the May 2024 OES data published by the U.S. Bureau of Labor Statistics. I downloaded the data for the Houston-Pasadena-The Woodlands, TX metro area from the Metropolitan and Nonmetropolitan Area Table.

3-SeriesReport: Based on the U.S. Bureau of Labor Statistics’ State and Area Employment, Hours, and Earnings (SAE) dataset. This time series data provides monthly employment figures by industry sector for specific metropolitan areas. I downloaded separate reports for five sectors within the Houston-The Woodlands-Sugar Land, TX MSA. For each sector, I used non-seasonally adjusted monthly data from 2015 to 2024.

Growing Sectors

To understand long-term employment trends in Houston, I collected monthly employment data from 2015 to 2024 for the city’s five largest sectors. I calculated the annual average employment for each sector by averaging monthly figures from January to December. Because the size of each sector differs significantly, I used percentage change to show year-over-year trends. The calculation process can be found in the markdown section.

The chart above shows how employment in these five industries has evolved over the past decade. Mining, Logging, and Construction—which largely represents the city’s oil and energy sector—is the only industry that experienced a consistent decline, with employment falling by nearly 2% since 2015.

In contrast, Education and Health Services grew the fastest, with a cumulative increase of more than 25%, signaling a broader shift toward service-oriented employment. Leisure and Hospitality, as well as Service-Providing sectors, also saw strong growth of 21.2% and 18.5% respectively. Retail Trade increased at a slower pace, rising by 6.9% over the same period.

Together, these trends suggest that while energy remains a central part of Houston’s identity, the city’s job market is increasingly defined by growth in education, healthcare, and service-related industries.

Decline of Oil

To verify whether Houston’s energy sector is truly in long-term decline, I examined the three subsectors under “Mining, Logging, and Construction”: Mining and Logging, Oil and Gas Extraction, and Support Activities for Mining. I downloaded monthly employment data from 2015 to 2025 for each subsector and combined them using R. The resulting stacked area chart illustrates a clear downward trend across all three components.

No single subsector is solely responsible for the decline—employment is shrinking across the board.

While this sector still employs more workers than most others combined, the decline is undeniable. As the Houston Chronicle reports, the waning is mainly caused by industry consolidates and slashes its spending.

Reports of Shell layoffs point to 10-year trend that cost Houston 60,000 jobs and counting: Major oil companies like Shell and ExxonMobil are now spending 67% less on exploration than they did in 2013. In Houston, the impact of these cuts is already being felt.

Wage Gaps！

To assess whether Houston is becoming a city of low-wage, high-volume work, it’s essential to look not just at employment volume but also at wage levels. To do this, I analyzed the May 2024 Occupational Employment and Wage Statistics from the U.S. Bureau of Labor Statistics for both Texas overall and the Houston metropolitan area.

Hourly mean wage is the core metric here. I joined the two datasets by SOC (Standard Occupational Classification) code. Since the Texas data breaks down some occupations by industry, I calculated the average hourly wage across all industries for each SOC code and summed the total employment to ensure a fair comparison. The full calculation process is documented in the markdown section below.

The top 10 underpaid jobs in Houston show relatively small wage differences, typically less than $3/hour, with the largest gap seen in sales and related occupations, where Houston workers earn about $7.36/hour less than their statewide counterparts. The rest of the list includes fast food workers, construction laborers, cashiers, stockers, and customer service representatives, all of whom experience minor pay disadvantages compared to the Texas average.

Occupations that earn significantly more in Houston are medical roles. At the top of the list are pathologists, who earn $135.66/hour in Houston—nearly $47/hour more than the statewide average. Other medical professions such as ophthalmologists, internal medicine physicians, dentists, and podiatrists also earn over $20/hour more in Houston than in Texas overall.

Common Low-Wage Jobs

Beyond wage extremes, it’s worth asking: Which low-paid jobs in Houston employ the most people?

I filtered for occupations where Houston wages fall below the statewide average, then ranked by local employment.

The result highlights key sectors like food services, construction, personal care, and retail—jobs that support the city’s infrastructure and everyday life, yet often offer relatively low pay. Nearly 196,000 people work in education in Houston, earning an average of $31.14 per hour, $15 below the statewide average.

High Pay, Low Headcount

Just for fun, I looked at jobs where the average hourly wage in Houston is higher than the state average but have very few people employed, then ranked them by how many people actually work in those roles locally. I pulled out the 10 least common ones—and it turns out most of them are super niche, with only 30 to 70 people employed in the entire metro area.

But hey: Media and Communication Equipment Workers turned out to be not only rare, but also pretty well paid in Houston. In fact, their wage gap is second only to podiatrists.

R Markdown

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(readxl)
library(rvest)

## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding

R-Growing Sectors

# Set your data folder path
data_path <- "~/Documents/GitHub/ava-hu/data"

# Only load the 5 sector files
file_paths <- list.files(
  path = data_path,
  pattern = "^SeriesReport.*\\.xlsx$|^HoustonMining.*\\.xlsx$",
  full.names = TRUE
)

# Get sector name from metadata
get_sector_name <- function(file) {
  meta <- read_excel(file, range = "A8:B8", col_names = FALSE)
  sector <- meta[[2]]
  if (is.na(sector)) sector <- str_remove(basename(file), "\\.xlsx")
  return(sector)
}

# Extract and compute Jan–Dec average for each year
extract_annual_avg <- function(file) {
  df <- read_excel(file, skip = 12) |> 
    clean_names() |> 
    select(year, jan:dec) |> 
    filter(!is.na(year) & str_detect(as.character(year), "^\\d{4}")) |> 
    mutate(across(jan:dec, as.numeric)) |> 
    mutate(year = as.integer(year))

  df$annual_avg <- rowMeans(df[, c("jan", "feb", "mar", "apr", "may", "jun",
                                   "jul", "aug", "sep", "oct", "nov", "dec")],
                            na.rm = TRUE)

  df <- df |> 
    filter(year >= 2015, year <= 2024)

  sector <- get_sector_name(file)

  df |> 
    select(year, annual_avg) |> 
    pivot_wider(names_from = year, values_from = annual_avg) |> 
    mutate(Sector = sector) |> 
    select(Sector, everything())
}

# Run across all files
sector_avg_table <- bind_rows(lapply(file_paths, extract_annual_avg))

## New names:
## New names:
## New names:
## New names:
## New names:
## • `` -> `...1`
## • `` -> `...2`

# View result
print(sector_avg_table)

## # A tibble: 5 × 11
##   Sector   `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022` `2023` `2024`
##   <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Mining,…   317.   300.   292.   302.   316.   283.   270.   290.   308.   311.
## 2 Service…  2433.  2472.  2510.  2559.  2610.  2500.  2592.  2738.  2839.  2882.
## 3 Retail …   301.   308.   310.   310.   305.   294.   307.   316.   320.   322.
## 4 Leisure…   300.   312.   317.   325.   334.   282.   310.   339.   356.   363.
## 5 Educati…   366.   377.   386.   394.   405.   397    409.   426.   448.   459.

trend_summary <- sector_avg_table |> 
  pivot_longer(-Sector, names_to = "year", values_to = "avg_employment") |> 
  mutate(
    year = as.integer(year),
    industry = Sector
  ) |> 
  select(industry, year, avg_employment)

trend_pct <- trend_summary |> 
  group_by(industry) |> 
  mutate(
    base_2015 = avg_employment[year == 2015],
    pct_change = (avg_employment - base_2015) / base_2015 * 100
  )

# Label for 2024
label_2024 <- trend_pct |> 
  filter(year == 2024) |> 
  mutate(
    label = paste0(
      ifelse(pct_change >= 0, "+", ""),
      round(pct_change, 1), "%"
    )
  )

# Plot with Y axis in %
ggplot(trend_pct, aes(x = year, y = pct_change, color = industry)) +
  geom_line(size = 1.1) +
  geom_text(data = label_2024, aes(label = label), 
            hjust = -0.1, vjust = 0.5, show.legend = FALSE, size = 3.5) +
  xlim(2015, 2025.5) +
  labs(
    title = "Percent Change in Employment by Sector (Relative to 2015)",
    subtitle = "Annual average employment, 2015–2024",
    x = "Year", y = "Change Since 2015 (%)",
    color = "Sector"
  ) +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Decline of Oil?

data_path <- "~/Documents/GitHub/ava-hu/data/oil"
files <- list.files(data_path, pattern = "\\.xlsx$", full.names = TRUE)

read_bls_file_with_label <- function(file) {
  
  industry_name <- read_excel(file, sheet = "BLS Data Series", range = "B9", col_names = FALSE) |> pull(1)

  read_excel(file, sheet = "BLS Data Series", skip = 11) |>
    rename(Year = 1) |>
    pivot_longer(-Year, names_to = "Month", values_to = "Employment") |>
    filter(!is.na(Employment)) |>
    mutate(
      Industry = industry_name,
      date = as.Date(paste(Year, Month, "01", sep = "-"), format = "%Y-%b-%d")
    )
}

oil_data <- map_dfr(files, read_bls_file_with_label)

## New names:
## New names:
## New names:
## • `` -> `...1`

oil_data |>
  mutate(Year = lubridate::year(date)) |>
  group_by(Industry, Year) |>
  summarise(annual_avg = mean(Employment, na.rm = TRUE), .groups = "drop") |>
  arrange(Industry, Year) |>
  group_by(Industry) |>
  mutate(change = annual_avg - lag(annual_avg)) |>
  filter(!is.na(change)) |>
  top_n(-1, change)

## # A tibble: 3 × 4
## # Groups:   Industry [3]
##   Industry                       Year annual_avg change
##   <chr>                         <dbl>      <dbl>  <dbl>
## 1 Mining and Logging             2016       81.2 -18.6 
## 2 Oil and Gas Extraction         2016       42.6  -8.92
## 3 Support Activities for Mining  2016       36.6  -9.63

R-Wage Gaps

UPLOAD & CLEAN DATA: Houston OES Data - This dataset includes wage and employment information for each occupation in the Houston-The Woodlands-Sugar Land metropolitan area.

houston_clean <- read_excel("../data/houston_OES_Report.xlsx", skip = 5) |> 
  clean_names() |> 
  mutate(
    soc_code = str_extract(occupation_soc_code, "\\d{2}-\\d{4}"),
    hourly_mean_houston = as.numeric(hourly_mean_wage),
    employment_houston = as.numeric(employment_1)
  ) |> 
  filter(!is.na(soc_code), !is.na(hourly_mean_houston), !is.na(employment_houston))

## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `hourly_mean_houston = as.numeric(hourly_mean_wage)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

Texas OES Data

This dataset contains overall Texas occupational employment and wage estimates. We’ll clean and align it to match the Houston dataset format.

To account for the fact that Texas OES data includes multiple entries per occupation across different industries (NAICS classifications), I grouped the dataset by SOC code. For each occupation, I calculated the average hourly wage across all industries and summed up the total employment. This aggregation provides a single representative wage and employment figure per occupation, allowing a clean, one-to-one comparison with the Houston dataset.

texas <- read_csv("../data/texas_oes.csv") |> 
  clean_names() |> 
  mutate(
    soc_code = str_trim(occ_code)
  )

## Rows: 242299 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (23): AREA_TITLE, NAICS, NAICS_TITLE, I_GROUP, OCC_CODE, OCC_TITLE, O_GR...
## dbl  (1): AREA
## lgl  (2): ANNUAL, HOURLY
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

texas_clean <- texas |> 
  clean_names() |> 
  mutate(soc_code = str_trim(occ_code)) |> 
  group_by(soc_code) |> 
  summarise(
    hourly_mean_texas = mean(as.numeric(h_mean), na.rm = TRUE),
    employment_texas = sum(as.numeric(tot_emp), na.rm = TRUE),
    .groups = "drop"
  )

## Warning: There were 864 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `hourly_mean_texas = mean(as.numeric(h_mean), na.rm = TRUE)`.
## ℹ In group 3: `soc_code = "11-1011"`.
## Caused by warning in `mean()`:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 863 remaining warnings.

MERGE & COMPARE: Join the two datasets by SOC code, then calculate wage gaps and prepare for analysis.

wage_compare <- inner_join(
  houston_clean, texas_clean,
  by = "soc_code"
) |> 
  distinct(soc_code, .keep_all = TRUE)

Filtered for occupations where workers in Houston earn less than their counterparts statewide. Then, we calculated the absolute wage gap and sorted the results to find the 10 largest gaps.

top10_low_paid_high_employment <- wage_compare |> 
  filter(hourly_mean_houston < hourly_mean_texas) |> 
  mutate(
    wage_gap = hourly_mean_texas - hourly_mean_houston
  ) |> 
  arrange(desc(employment_houston)) |> 
  slice(1:10) |> 
  select(
    soc_code,
    occ_title = occupation_soc_code,
    hourly_mean_houston,
    hourly_mean_texas,
    wage_gap,
    employment_houston
  )

print(top10_low_paid_high_employment)

## # A tibble: 10 × 6
##    soc_code occ_title             hourly_mean_houston hourly_mean_texas wage_gap
##    <chr>    <chr>                               <dbl>             <dbl>    <dbl>
##  1 35-0000  Food Preparation and…                14.9              17.4    2.51 
##  2 41-0000  Sales and Related Oc…                24.4              31.7    7.36 
##  3 47-0000  Construction and Ext…                26.6              28.2    1.62 
##  4 31-0000  Healthcare Support O…                16.4              19.2    2.77 
##  5 35-3023  Fast Food and Counte…                12.8              14.2    1.42 
##  6 37-0000  Building and Grounds…                16.3              17.1    0.798
##  7 41-2031  Retail Salespersons …                16.3              17.4    1.19 
##  8 43-4051  Customer Service Rep…                20.4              20.9    0.487
##  9 53-7065  Stockers and Order F…                18.2              19.6    1.41 
## 10 41-2011  Cashiers (41-2011)                   14.1              14.8    0.682
## # ℹ 1 more variable: employment_houston <dbl>

The reverse: Occupations where Houston pays more than the state average. Again, we ranked them by wage difference and kept the top 10.

top10_houston_above_texas <- wage_compare |> 
  filter(hourly_mean_houston > hourly_mean_texas) |> 
  mutate(wage_gap = hourly_mean_houston - hourly_mean_texas) |> 
  arrange(desc(wage_gap)) |> 
  distinct(soc_code, .keep_all = TRUE) |>  
  slice(1:10) |> 
  select(
    soc_code,
    occ_title = occupation_soc_code,
    hourly_mean_houston,
    hourly_mean_texas,
    wage_gap,
    employment_houston,
    employment_texas
  )

print(top10_houston_above_texas)

## # A tibble: 10 × 7
##    soc_code occ_title             hourly_mean_houston hourly_mean_texas wage_gap
##    <chr>    <chr>                               <dbl>             <dbl>    <dbl>
##  1 29-1222  Physicians, Patholog…               136.               88.8     46.8
##  2 29-1241  Ophthalmologists, Ex…               144.              109.      35.3
##  3 29-1216  General Internal Med…               150.              120.      30.7
##  4 29-1021  Dentists, General (2…               116.               89.9     25.7
##  5 29-1229  Physicians, All Othe…               124.               99.6     23.9
##  6 13-2052  Personal Financial A…                64.7              42.0     22.7
##  7 11-1011  Chief Executives (11…               161.              139.      22.7
##  8 29-1081  Podiatrists (29-1081)               111.               88.9     22.0
##  9 13-2081  Tax Examiners and Co…                48.1              28.0     20.1
## 10 29-1125  Recreational Therapi…                46.3              27.9     18.4
## # ℹ 2 more variables: employment_houston <dbl>, employment_texas <dbl>

Low wages high employment

top10_low_paid_high_employment <- wage_compare |> 
  filter(hourly_mean_houston < hourly_mean_texas) |> 
  mutate(
    wage_gap = hourly_mean_texas - hourly_mean_houston
  ) |> 
  arrange(desc(employment_houston)) |> 
  distinct(soc_code, .keep_all = TRUE) |> 
  slice(1:10) |> 
  select(
    soc_code,
    occ_title = occupation_soc_code,
    hourly_mean_houston,
    hourly_mean_texas,
    wage_gap,
    employment_houston
  )

print(top10_low_paid_high_employment)

## # A tibble: 10 × 6
##    soc_code occ_title             hourly_mean_houston hourly_mean_texas wage_gap
##    <chr>    <chr>                               <dbl>             <dbl>    <dbl>
##  1 35-0000  Food Preparation and…                14.9              17.4    2.51 
##  2 41-0000  Sales and Related Oc…                24.4              31.7    7.36 
##  3 47-0000  Construction and Ext…                26.6              28.2    1.62 
##  4 31-0000  Healthcare Support O…                16.4              19.2    2.77 
##  5 35-3023  Fast Food and Counte…                12.8              14.2    1.42 
##  6 37-0000  Building and Grounds…                16.3              17.1    0.798
##  7 41-2031  Retail Salespersons …                16.3              17.4    1.19 
##  8 43-4051  Customer Service Rep…                20.4              20.9    0.487
##  9 53-7065  Stockers and Order F…                18.2              19.6    1.41 
## 10 41-2011  Cashiers (41-2011)                   14.1              14.8    0.682
## # ℹ 1 more variable: employment_houston <dbl>

Higher wages low employment

top10_high_paid_low_employment <- wage_compare |> 
  filter(hourly_mean_houston > hourly_mean_texas) |> 
  mutate(
    wage_gap = hourly_mean_houston - hourly_mean_texas
  ) |> 
  arrange(employment_houston) |>  
  distinct(soc_code, .keep_all = TRUE) |> 
  slice(1:10) |> 
  select(
    soc_code,
    occ_title = occupation_soc_code,
    hourly_mean_houston,
    hourly_mean_texas,
    wage_gap,
    employment_houston
  )

print(top10_high_paid_low_employment)

## # A tibble: 10 × 6
##    soc_code occ_title             hourly_mean_houston hourly_mean_texas wage_gap
##    <chr>    <chr>                               <dbl>             <dbl>    <dbl>
##  1 27-4099  Media and Communicat…                41.6              35.9    5.77 
##  2 33-3031  Fish and Game Warden…                38.4              37.4    0.950
##  3 33-3041  Parking Enforcement …                18.5              18.2    0.287
##  4 19-4071  Forest and Conservat…                24.5              23.2    1.24 
##  5 27-1012  Craft Artists (27-10…                24.0              23.6    0.383
##  6 53-4099  Rail Transportation …                14.6              13.8    0.810
##  7 19-4044  Hydrologic Technicia…                26.9              25.9    0.998
##  8 49-9069  Precision Instrument…                38.6              32.9    5.70 
##  9 11-9013  Farmers, Ranchers, a…                40.9              38.7    2.13 
## 10 29-1081  Podiatrists (29-1081)               111.               88.9   22.0  
## # ℹ 1 more variable: employment_houston <dbl>