Houston is often seen as the oil capital of America, but its job market tells a more complicated story. This analysis shows that while energy jobs are shrinking, lower-paying service sector jobs are growing—and in some cases, dominating the local economy. Many of Houston’s most common occupations, like fast food workers and cashiers, earn less than the statewide average, even as they support the city’s daily life. Meanwhile, some of the highest-paid roles—like podiatrists and internal medicine physicians—are vanishingly rare.
This imbalance raises a key question: Is Houston becoming a city of low-wage, high-volume work?
My analysis focuses on:
Which sectors are growing or shrinking in
Houston?
Using monthly data from 2015 to 2024, I calculate annual average
employment and show which sectors are driving growth.
Is Houston moving away from oil?
By comparing trends in oil-linked sectors with service-oriented
industries, I assess whether the city is undergoing a structural
shift.
Which occupations in Houston are underpaid/Overpaid
compared to the Texas average?
I identify jobs where workers in Houston earn less than their statewide
counterparts—and how widespread those roles are.
Which low-wage jobs employ the most people in
Houston?
Some of the city’s most common jobs, such as food service or customer
support, also offer wages below the state average.
Are there high-paying jobs that are rare in
Houston?
I rank the most well-compensated roles that have surprisingly low local
employment.
1-Texas_OES_Report: Based on the U.S. Bureau of Labor Statistics’ May 2024 Occupational Employment and Wage Statistics. I used statewide data across all sectors. Source: BLS OES Research Estimates.
2-Houston_OES_Report: Derived from the May 2024 OES data published by the U.S. Bureau of Labor Statistics. I downloaded the data for the Houston-Pasadena-The Woodlands, TX metro area from the Metropolitan and Nonmetropolitan Area Table.
3-SeriesReport: Based on the U.S. Bureau of Labor Statistics’ State and Area Employment, Hours, and Earnings (SAE) dataset. This time series data provides monthly employment figures by industry sector for specific metropolitan areas. I downloaded separate reports for five sectors within the Houston-The Woodlands-Sugar Land, TX MSA. For each sector, I used non-seasonally adjusted monthly data from 2015 to 2024.
To understand long-term employment trends in Houston, I collected monthly employment data from 2015 to 2024 for the city’s five largest sectors. I calculated the annual average employment for each sector by averaging monthly figures from January to December. Because the size of each sector differs significantly, I used percentage change to show year-over-year trends. The calculation process can be found in the markdown section.
The chart above shows how employment in these five industries has evolved over the past decade. Mining, Logging, and Construction—which largely represents the city’s oil and energy sector—is the only industry that experienced a consistent decline, with employment falling by nearly 2% since 2015.
In contrast, Education and Health Services grew the fastest, with a cumulative increase of more than 25%, signaling a broader shift toward service-oriented employment. Leisure and Hospitality, as well as Service-Providing sectors, also saw strong growth of 21.2% and 18.5% respectively. Retail Trade increased at a slower pace, rising by 6.9% over the same period.
Together, these trends suggest that while energy remains a central part of Houston’s identity, the city’s job market is increasingly defined by growth in education, healthcare, and service-related industries.
To verify whether Houston’s energy sector is truly in long-term decline, I examined the three subsectors under “Mining, Logging, and Construction”: Mining and Logging, Oil and Gas Extraction, and Support Activities for Mining. I downloaded monthly employment data from 2015 to 2025 for each subsector and combined them using R. The resulting stacked area chart illustrates a clear downward trend across all three components.
No single subsector is solely responsible for the decline—employment is shrinking across the board.
While this sector still employs more workers than most others combined, the decline is undeniable. As the Houston Chronicle reports, the waning is mainly caused by industry consolidates and slashes its spending.
Reports of Shell layoffs point to 10-year trend that cost Houston 60,000 jobs and counting: Major oil companies like Shell and ExxonMobil are now spending 67% less on exploration than they did in 2013. In Houston, the impact of these cuts is already being felt.
To assess whether Houston is becoming a city of low-wage, high-volume work, it’s essential to look not just at employment volume but also at wage levels. To do this, I analyzed the May 2024 Occupational Employment and Wage Statistics from the U.S. Bureau of Labor Statistics for both Texas overall and the Houston metropolitan area.
Hourly mean wage is the core metric here. I joined the two datasets by SOC (Standard Occupational Classification) code. Since the Texas data breaks down some occupations by industry, I calculated the average hourly wage across all industries for each SOC code and summed the total employment to ensure a fair comparison. The full calculation process is documented in the markdown section below.
The top 10 underpaid jobs in Houston show relatively small wage differences, typically less than $3/hour, with the largest gap seen in sales and related occupations, where Houston workers earn about $7.36/hour less than their statewide counterparts. The rest of the list includes fast food workers, construction laborers, cashiers, stockers, and customer service representatives, all of whom experience minor pay disadvantages compared to the Texas average.
Occupations that earn significantly more in Houston are medical roles. At the top of the list are pathologists, who earn $135.66/hour in Houston—nearly $47/hour more than the statewide average. Other medical professions such as ophthalmologists, internal medicine physicians, dentists, and podiatrists also earn over $20/hour more in Houston than in Texas overall.
Beyond wage extremes, it’s worth asking: Which low-paid jobs in Houston employ the most people?
I filtered for occupations where Houston wages fall below the statewide average, then ranked by local employment.
The result highlights key sectors like food services, construction, personal care, and retail—jobs that support the city’s infrastructure and everyday life, yet often offer relatively low pay. Nearly 196,000 people work in education in Houston, earning an average of $31.14 per hour, $15 below the statewide average.
Just for fun, I looked at jobs where the average hourly wage in Houston is higher than the state average but have very few people employed, then ranked them by how many people actually work in those roles locally. I pulled out the 10 least common ones—and it turns out most of them are super niche, with only 30 to 70 people employed in the entire metro area.
But hey: Media and Communication Equipment Workers turned out to be not only rare, but also pretty well paid in Houston. In fact, their wage gap is second only to podiatrists.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(readxl)
library(rvest)
##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:readr':
##
## guess_encoding
# Set your data folder path
data_path <- "~/Documents/GitHub/ava-hu/data"
# Only load the 5 sector files
file_paths <- list.files(
path = data_path,
pattern = "^SeriesReport.*\\.xlsx$|^HoustonMining.*\\.xlsx$",
full.names = TRUE
)
# Get sector name from metadata
get_sector_name <- function(file) {
meta <- read_excel(file, range = "A8:B8", col_names = FALSE)
sector <- meta[[2]]
if (is.na(sector)) sector <- str_remove(basename(file), "\\.xlsx")
return(sector)
}
# Extract and compute Jan–Dec average for each year
extract_annual_avg <- function(file) {
df <- read_excel(file, skip = 12) |>
clean_names() |>
select(year, jan:dec) |>
filter(!is.na(year) & str_detect(as.character(year), "^\\d{4}")) |>
mutate(across(jan:dec, as.numeric)) |>
mutate(year = as.integer(year))
df$annual_avg <- rowMeans(df[, c("jan", "feb", "mar", "apr", "may", "jun",
"jul", "aug", "sep", "oct", "nov", "dec")],
na.rm = TRUE)
df <- df |>
filter(year >= 2015, year <= 2024)
sector <- get_sector_name(file)
df |>
select(year, annual_avg) |>
pivot_wider(names_from = year, values_from = annual_avg) |>
mutate(Sector = sector) |>
select(Sector, everything())
}
# Run across all files
sector_avg_table <- bind_rows(lapply(file_paths, extract_annual_avg))
## New names:
## New names:
## New names:
## New names:
## New names:
## • `` -> `...1`
## • `` -> `...2`
# View result
print(sector_avg_table)
## # A tibble: 5 × 11
## Sector `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022` `2023` `2024`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mining,… 317. 300. 292. 302. 316. 283. 270. 290. 308. 311.
## 2 Service… 2433. 2472. 2510. 2559. 2610. 2500. 2592. 2738. 2839. 2882.
## 3 Retail … 301. 308. 310. 310. 305. 294. 307. 316. 320. 322.
## 4 Leisure… 300. 312. 317. 325. 334. 282. 310. 339. 356. 363.
## 5 Educati… 366. 377. 386. 394. 405. 397 409. 426. 448. 459.
trend_summary <- sector_avg_table |>
pivot_longer(-Sector, names_to = "year", values_to = "avg_employment") |>
mutate(
year = as.integer(year),
industry = Sector
) |>
select(industry, year, avg_employment)
trend_pct <- trend_summary |>
group_by(industry) |>
mutate(
base_2015 = avg_employment[year == 2015],
pct_change = (avg_employment - base_2015) / base_2015 * 100
)
# Label for 2024
label_2024 <- trend_pct |>
filter(year == 2024) |>
mutate(
label = paste0(
ifelse(pct_change >= 0, "+", ""),
round(pct_change, 1), "%"
)
)
# Plot with Y axis in %
ggplot(trend_pct, aes(x = year, y = pct_change, color = industry)) +
geom_line(size = 1.1) +
geom_text(data = label_2024, aes(label = label),
hjust = -0.1, vjust = 0.5, show.legend = FALSE, size = 3.5) +
xlim(2015, 2025.5) +
labs(
title = "Percent Change in Employment by Sector (Relative to 2015)",
subtitle = "Annual average employment, 2015–2024",
x = "Year", y = "Change Since 2015 (%)",
color = "Sector"
) +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
data_path <- "~/Documents/GitHub/ava-hu/data/oil"
files <- list.files(data_path, pattern = "\\.xlsx$", full.names = TRUE)
read_bls_file_with_label <- function(file) {
industry_name <- read_excel(file, sheet = "BLS Data Series", range = "B9", col_names = FALSE) |> pull(1)
read_excel(file, sheet = "BLS Data Series", skip = 11) |>
rename(Year = 1) |>
pivot_longer(-Year, names_to = "Month", values_to = "Employment") |>
filter(!is.na(Employment)) |>
mutate(
Industry = industry_name,
date = as.Date(paste(Year, Month, "01", sep = "-"), format = "%Y-%b-%d")
)
}
oil_data <- map_dfr(files, read_bls_file_with_label)
## New names:
## New names:
## New names:
## • `` -> `...1`
oil_data |>
mutate(Year = lubridate::year(date)) |>
group_by(Industry, Year) |>
summarise(annual_avg = mean(Employment, na.rm = TRUE), .groups = "drop") |>
arrange(Industry, Year) |>
group_by(Industry) |>
mutate(change = annual_avg - lag(annual_avg)) |>
filter(!is.na(change)) |>
top_n(-1, change)
## # A tibble: 3 × 4
## # Groups: Industry [3]
## Industry Year annual_avg change
## <chr> <dbl> <dbl> <dbl>
## 1 Mining and Logging 2016 81.2 -18.6
## 2 Oil and Gas Extraction 2016 42.6 -8.92
## 3 Support Activities for Mining 2016 36.6 -9.63
UPLOAD & CLEAN DATA: Houston OES Data - This dataset includes wage and employment information for each occupation in the Houston-The Woodlands-Sugar Land metropolitan area.
houston_clean <- read_excel("../data/houston_OES_Report.xlsx", skip = 5) |>
clean_names() |>
mutate(
soc_code = str_extract(occupation_soc_code, "\\d{2}-\\d{4}"),
hourly_mean_houston = as.numeric(hourly_mean_wage),
employment_houston = as.numeric(employment_1)
) |>
filter(!is.na(soc_code), !is.na(hourly_mean_houston), !is.na(employment_houston))
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `hourly_mean_houston = as.numeric(hourly_mean_wage)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
Texas OES Data
This dataset contains overall Texas occupational employment and wage estimates. We’ll clean and align it to match the Houston dataset format.
To account for the fact that Texas OES data includes multiple entries per occupation across different industries (NAICS classifications), I grouped the dataset by SOC code. For each occupation, I calculated the average hourly wage across all industries and summed up the total employment. This aggregation provides a single representative wage and employment figure per occupation, allowing a clean, one-to-one comparison with the Houston dataset.
texas <- read_csv("../data/texas_oes.csv") |>
clean_names() |>
mutate(
soc_code = str_trim(occ_code)
)
## Rows: 242299 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (23): AREA_TITLE, NAICS, NAICS_TITLE, I_GROUP, OCC_CODE, OCC_TITLE, O_GR...
## dbl (1): AREA
## lgl (2): ANNUAL, HOURLY
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
texas_clean <- texas |>
clean_names() |>
mutate(soc_code = str_trim(occ_code)) |>
group_by(soc_code) |>
summarise(
hourly_mean_texas = mean(as.numeric(h_mean), na.rm = TRUE),
employment_texas = sum(as.numeric(tot_emp), na.rm = TRUE),
.groups = "drop"
)
## Warning: There were 864 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `hourly_mean_texas = mean(as.numeric(h_mean), na.rm = TRUE)`.
## ℹ In group 3: `soc_code = "11-1011"`.
## Caused by warning in `mean()`:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 863 remaining warnings.
MERGE & COMPARE: Join the two datasets by SOC code, then calculate wage gaps and prepare for analysis.
wage_compare <- inner_join(
houston_clean, texas_clean,
by = "soc_code"
) |>
distinct(soc_code, .keep_all = TRUE)
Filtered for occupations where workers in Houston earn less than their counterparts statewide. Then, we calculated the absolute wage gap and sorted the results to find the 10 largest gaps.
top10_low_paid_high_employment <- wage_compare |>
filter(hourly_mean_houston < hourly_mean_texas) |>
mutate(
wage_gap = hourly_mean_texas - hourly_mean_houston
) |>
arrange(desc(employment_houston)) |>
slice(1:10) |>
select(
soc_code,
occ_title = occupation_soc_code,
hourly_mean_houston,
hourly_mean_texas,
wage_gap,
employment_houston
)
print(top10_low_paid_high_employment)
## # A tibble: 10 × 6
## soc_code occ_title hourly_mean_houston hourly_mean_texas wage_gap
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 35-0000 Food Preparation and… 14.9 17.4 2.51
## 2 41-0000 Sales and Related Oc… 24.4 31.7 7.36
## 3 47-0000 Construction and Ext… 26.6 28.2 1.62
## 4 31-0000 Healthcare Support O… 16.4 19.2 2.77
## 5 35-3023 Fast Food and Counte… 12.8 14.2 1.42
## 6 37-0000 Building and Grounds… 16.3 17.1 0.798
## 7 41-2031 Retail Salespersons … 16.3 17.4 1.19
## 8 43-4051 Customer Service Rep… 20.4 20.9 0.487
## 9 53-7065 Stockers and Order F… 18.2 19.6 1.41
## 10 41-2011 Cashiers (41-2011) 14.1 14.8 0.682
## # ℹ 1 more variable: employment_houston <dbl>
The reverse: Occupations where Houston pays more than the state average. Again, we ranked them by wage difference and kept the top 10.
top10_houston_above_texas <- wage_compare |>
filter(hourly_mean_houston > hourly_mean_texas) |>
mutate(wage_gap = hourly_mean_houston - hourly_mean_texas) |>
arrange(desc(wage_gap)) |>
distinct(soc_code, .keep_all = TRUE) |>
slice(1:10) |>
select(
soc_code,
occ_title = occupation_soc_code,
hourly_mean_houston,
hourly_mean_texas,
wage_gap,
employment_houston,
employment_texas
)
print(top10_houston_above_texas)
## # A tibble: 10 × 7
## soc_code occ_title hourly_mean_houston hourly_mean_texas wage_gap
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 29-1222 Physicians, Patholog… 136. 88.8 46.8
## 2 29-1241 Ophthalmologists, Ex… 144. 109. 35.3
## 3 29-1216 General Internal Med… 150. 120. 30.7
## 4 29-1021 Dentists, General (2… 116. 89.9 25.7
## 5 29-1229 Physicians, All Othe… 124. 99.6 23.9
## 6 13-2052 Personal Financial A… 64.7 42.0 22.7
## 7 11-1011 Chief Executives (11… 161. 139. 22.7
## 8 29-1081 Podiatrists (29-1081) 111. 88.9 22.0
## 9 13-2081 Tax Examiners and Co… 48.1 28.0 20.1
## 10 29-1125 Recreational Therapi… 46.3 27.9 18.4
## # ℹ 2 more variables: employment_houston <dbl>, employment_texas <dbl>
top10_low_paid_high_employment <- wage_compare |>
filter(hourly_mean_houston < hourly_mean_texas) |>
mutate(
wage_gap = hourly_mean_texas - hourly_mean_houston
) |>
arrange(desc(employment_houston)) |>
distinct(soc_code, .keep_all = TRUE) |>
slice(1:10) |>
select(
soc_code,
occ_title = occupation_soc_code,
hourly_mean_houston,
hourly_mean_texas,
wage_gap,
employment_houston
)
print(top10_low_paid_high_employment)
## # A tibble: 10 × 6
## soc_code occ_title hourly_mean_houston hourly_mean_texas wage_gap
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 35-0000 Food Preparation and… 14.9 17.4 2.51
## 2 41-0000 Sales and Related Oc… 24.4 31.7 7.36
## 3 47-0000 Construction and Ext… 26.6 28.2 1.62
## 4 31-0000 Healthcare Support O… 16.4 19.2 2.77
## 5 35-3023 Fast Food and Counte… 12.8 14.2 1.42
## 6 37-0000 Building and Grounds… 16.3 17.1 0.798
## 7 41-2031 Retail Salespersons … 16.3 17.4 1.19
## 8 43-4051 Customer Service Rep… 20.4 20.9 0.487
## 9 53-7065 Stockers and Order F… 18.2 19.6 1.41
## 10 41-2011 Cashiers (41-2011) 14.1 14.8 0.682
## # ℹ 1 more variable: employment_houston <dbl>
top10_high_paid_low_employment <- wage_compare |>
filter(hourly_mean_houston > hourly_mean_texas) |>
mutate(
wage_gap = hourly_mean_houston - hourly_mean_texas
) |>
arrange(employment_houston) |>
distinct(soc_code, .keep_all = TRUE) |>
slice(1:10) |>
select(
soc_code,
occ_title = occupation_soc_code,
hourly_mean_houston,
hourly_mean_texas,
wage_gap,
employment_houston
)
print(top10_high_paid_low_employment)
## # A tibble: 10 × 6
## soc_code occ_title hourly_mean_houston hourly_mean_texas wage_gap
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 27-4099 Media and Communicat… 41.6 35.9 5.77
## 2 33-3031 Fish and Game Warden… 38.4 37.4 0.950
## 3 33-3041 Parking Enforcement … 18.5 18.2 0.287
## 4 19-4071 Forest and Conservat… 24.5 23.2 1.24
## 5 27-1012 Craft Artists (27-10… 24.0 23.6 0.383
## 6 53-4099 Rail Transportation … 14.6 13.8 0.810
## 7 19-4044 Hydrologic Technicia… 26.9 25.9 0.998
## 8 49-9069 Precision Instrument… 38.6 32.9 5.70
## 9 11-9013 Farmers, Ranchers, a… 40.9 38.7 2.13
## 10 29-1081 Podiatrists (29-1081) 111. 88.9 22.0
## # ℹ 1 more variable: employment_houston <dbl>