Majors that Matter Study Methodology | Burning Glass Technologies

Majors that Matter: Report Methodology

Download Report

Majors that Matter: Ensuring College Graduates Avoid Underemployment

Part 1: Data Sources Used in the Report

The data used in this paper were primarily extracted from Burning Glass Technologies’ unique data assets: a database of more than 800 million job postings providing a detailed view into the jobs and skills that employers demand and a database of more than 80 million resumes illuminating the actual career progression of American workers. We also drew from federal surveys and administrative data sets relating to degree completion, majors, and workers’ earnings.


Resume Data

The analyses of workers’ career outcomes were pulled from Burning Glass’s resume database, which captures the detailed work history and education of millions of workers across the United States. Resumes are collected from Burning Glass’s partners. Resumes were included in this study if they met the following criteria: the worker has a bachelor’s degree and at least five years of work experience thereafter. The analyses in this report were based on four million resumes that met these criteria, covering work experiences from 2000 to 2017. Further details about our treatment of the resume data are described in Part 2 of the Appendix.

This report was based on aggregate career path and skills data and no personally identifiable information was used by researchers. Burning Glass Technologies has developed a database of millions of recent resumes. When a resume enters the system, the name, address, and other identifying details are encrypted so that they are not accessible to the research team. Researchers compile resumes with similar characteristics so that they can determine which types of transitions and career progressions commonly occur at a population level.


Job Postings Data

To supplement traditional sources of labor market data with more detailed information on employer demand for jobs, skills, and specific credentials, Burning Glass mined its comprehensive database of over 800 million online job postings. Burning Glass collects job postings from close to 50,000 online job boards, newspapers, and employer sites on a daily basis and de-duplicates postings for the same job, whether it is posted multiple times on the same site or across multiple sites. Burning Glass then applies detailed text analytics to code the specific jobs, skills, and credentials requested by employers.



O*NET[1] is a government-sponsored, publicly available database containing hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. O*NET tracks job trends and analyzes skill level by occupation, that is, whether the skills necessary for a particular job are taught in high school, entail some college, or require a bachelor’s degree or more. The O*NET database was initially populated by data collected from occupation analysts; this information is updated by ongoing surveys of each occupation’s worker population and occupation experts.


American Community Survey

The American Community Survey (ACS)[2] is an ongoing annual survey of Americans that provides data on jobs and occupations, educational attainment, and veteran status, among other topics.

Part 2: Overview of Resume Analyses

The resume dataset is a Burning Glass Technologies proprietary dataset, sourced from Burning Glass partners. This dataset includes information about an individual’s demographics, career path, and employers.

The resume dataset contains information about an individual’s location, level of educational attainment, the institutions at which he or she studied, the major, and any certifications held. The dataset also contains information about an individual’s career path, for example, occupation and time spent in any workplace and role, years of experience, employer name and location, and industry. In addition, an individual resume may list skills and the years of experience with any particular skill. All personally identifiable information such as name, address, and contact information are encrypted and not available to researchers.

Resume Sample Selection

To capture the work history, educational attainment, and resulting underemployment of workers over the life of their careers, Burning Glass selected a total of four million resumes for inclusion in this study, based on the following criteria:

  1. Individuals in the selected group must have commenced their first job during or after the year 2000, where an individual’s first job was classified as the first job listed on a resume.
  2. The time worked in the first job must have been longer than six months, to avoid internships and other short-term projects.
  3. Individuals must have occupational information about a first job and the job five years later. For a subsample of resumes, we also assessed underemployment 10 years later, where job data were available.
  4. Individuals must hold a bachelor’s degree or higher. This restriction was imposed because the underemployment of workers was calculated within the sample of workers with bachelor’s degrees or higher.
  5. At each point in the analysis, individuals must have had civilian employment, as military occupations have a distinct hiring system for which research on underemployment is not germane.


Coding Occupation and Education from Resumes

For this analysis, we collected information for our samples based on an individual’s occupation in a first job, five years later, and 10 years later (where available). Our occupation coding is based on the occupational definitions provided by O*NET,[3] which extends the U.S. Department of Labor’s Standard Occupational Classification System.[4]

Occupation coding is conducted according to a proprietary classification system developed by Burning Glass, which includes a blend of human-generated rules and machine-learning systems to ensure that each job is correctly coded into the correct occupational category.

We analyzed individuals’ education by categorizing the undergraduate program of study according to the National Center for Education Statistics’ Classification of Instructional Programs (CIP).[5]

Predicting Gender in Resumes

To study the effect of gender on underemployment, we used the gender R package to determine the gender of an individual in the resume sample. The R package uses an estimated date of birth (1970–2000) and the first name from the resume to predict the gender of an individual based on historical Social Security Administration data.[6],[7] Using this approach, we estimated the probability of each individual in the sample as being a particular gender and used a cutoff threshold probability of 0.6 or higher to conclude that an individual was of the predicted gender. Individuals for whom no gender prediction was possible were not included in the sample for the gender-specific analyses. The gender analysis was done prior to further analysis of the data, and the gender data available to researchers was attached to anonymized records. At no time were names or other personally identifiable information available to researchers.


Part 3: Calculating Expected Salary using Data from the American Community Survey (ACS)

Since resumes do not typically include salary information, we used the Census Bureau data to estimate salary based on the occupational and demographic characteristics of each worker. We used pooled one-year samples from 2012 to 2017. We focused on individuals aged 22 to 27 years old (recent college graduates) and restricted the sample to those who were working and in the labor force, had at least a bachelor’s degree, were not enrolled in any educational program, and worked at least 30 hours per week.[8] For these people, we looked at their gender, their major, occupation, and salary for the periods of 2012 to 2017. Incomes were restricted to those between $15,000 and $200,000 per year.

To arrive at the cost of underemployment, we estimated the average salary of the underemployed for each major and then compared that with the average salary of properly employed graduates with the same major. We then express the cost as the percentage of salary that individuals are losing because they are not properly employed.

To estimate the 15-year losses, we calculated present value estimates for five-year average salaries for three cohorts, 22 to 26, 27 to 31, and 32 to 36. For present value we used growth rates estimated directly from the ACS data and a discount rate of 3.5%.


Part 4: Supply-Demand Model and Key Skills

Following our work with the U.S. Chamber of Commerce[9],[10] we developed a supply-demand model that compares the number of open positions to the number of available workers in the field for each occupation and major.


To measure demand, we used an econometric model that starts with total postings collected by Burning Glass by occupation and normalizes those to equal the total number of national openings reported by the Bureau of Labor Statistics’ Job Openings and Labor Turnover Survey (JOLTS). Supply is measured based on the total number JOLTS has for workers separating from their job. We then estimated a turnover rate for each occupation based on data from the Census’s Current Population Survey (CPS). We determined the available number of workers by multiplying the churn rate by the total employment in each industry and occupation. To estimate supply and demand by major, we applied the distribution of college majors to occupation data from ACS. Demand and supply are then compared to determine the ratio used as a summary statistic for each major.


The supply-demand is also used to estimate gaps in key skills. Using postings data, we estimated demanded skills from employers, by using the recall rate[11] from the latest year of available data. For supply of skills, we used the average recall from postings data from the past five years. Here the assumption was that the skills that employers sought during the past five years were also what employees and students possess today. Hence an implicit assumption is that the labor market responds to employer demand, but with some lag. To get the specific number of skills demanded and supplied, we multiplied supply and demand numbers from the model above, with the appropriate recall rates.


Part 5: Salary Model and Skills Premia

Burning Glass Technologies has developed a deep neural network model[12] to predict the salary of postings that do not contain that information. While someone could estimate average salaries directly from postings data, only 15% of posting contain salary information. However most of postings contain information that can be used to precisely estimate the salary, such as education, experience, job titles, skills, etc. Using the predicted salary, can lead to more accurate estimates of salary premia, due to the increased sample size.

While neural network models are great for predictions, the results they produce are not easily interpreted. To estimate the effect of a feature on salary, we used a Dropped Feature Analysis (DFA). Given a set of input features we can make a prediction with the model. If we drop a single feature prior to input, we may also obtain a prediction with the model. This prediction will be different; it may be greater than or less than the original prediction. If the predicted value is less than the base salary (all features are included) this indicates that this feature is important to this input set. Using this method, we can estimate the added value of each skill on an individual posting or on an occupation.

[1] O*NET Resource Center, “About O*NET,”, accessed Aug. 30, 2018.

[2] U.S. Census Bureau, “About the American Community Survey,”, accessed Sept. 10, 2018.

[3] O*NET Resource Center, “The O*NET-SOC Taxonomy,”, accessed Aug. 20, 2018.

[4] Bureau of Labor Statistics, “Standard Occupational Classification,”, accessed Sept. 10, 2018.

[5] See for more information. For the purpose of this analysis, we merge CIP code 14 (Engineering Technologies and Engineering-Related Fields) and CIP code 15 (Engineering) and treat them as a single major.

[6] This R package uses historical datasets from the U.S. Social Security Administration, the U.S. Census Bureau (via IPUMS USA), and the North Atlantic Population Project to provide predictions of gender for first names for particular countries and time periods.

[7] Blevins, Cameron, and Lincoln Mullen. “Jane, John… Leslie? A Historical Method for Algorithmic Gender Prediction.” DHQ: Digital Humanities Quarterly 9, no. 3 (2015).

[8] Here, we follow the same selection criteria as in Abel, Jaison R., and Richard Deitz, “Underemployment in the early careers of college graduates following the Great Recession.” In Education, Skills, and Technical Change: Implications for Future US GDP Growth. University of Chicago Press, 2017.


[9] Burning Glass Technologies, “Different Skills, Different Gaps: Measuring and Closing the Skills Gap,” 2018.

[10] One important extension of this model, over the work with the U.S. Chamber of Commerce, is that it considers both the number of workers who are retiring and the new flow of students.

[11] Recall rate for skills is defined as the percentage of postings that are calling for a specific skill.

[12] Chewning, Keith, Liu, Zhiyuan, and Gaurav, Manish. “Learning a Semantic Representation of Atomic Entities for Salary Prediction.” In Proceedings of KDD 2018. ACM, New York, 2018.

Download Report

Majors that Matter: Ensuring College Graduates Avoid Underemployment