Real-Time Jobs Data

Using patented technology, Burning Glass aggregates, extracts, codes, and normalizes job data from more than 17,000 job boards, newspapers, employers, and other websites, supplying you with the accurate and comprehensive data you need for your job board, job bank, job search application or research project.

Job boards, job banks, and labor market analysts have one thing in common: they all depend on access to accurate, robust, and up to date jobs data. Burning Glass addresses this need through a data collection program that populates a real-time database of job opportunity information in a manner that provides as accurate a representation as possible of the full scope of advertised labor demand on a local, statewide or national basis.

Burning Glass’s system is the most extensively tested, identifies jobs from the greatest number of websites with the highest level of frequency, and as a result, generates a larger database of current job opportunities than any other in the industry.

Our process has been refined over the course of a decade. The U.S. Patent and Trademark Office recognized the innovativeness of the approach by granting a patent on “determining whether a web site contains employment data” and then “formatting, parsing and storing the employment data and corresponding URL into a database”. Having such a comprehensive repository of real-time jobs data is an important starting point. However, the potential of real-time advertised demand analysis has historically been limited by its usability. Because most job ads are written in free text, few reliable structured data fields could be extracted. Typically, coding was limited to the application of occupational and industry codes. Among other issues, there can be considerable ambiguities in the meaning and context of words which may not be readily discerned in an automated way. Conventional systems approach such challenges through rudimentary semantic, or lexical, analysis, in which computers are essentially tasked with inferring the meaning of words by combining a thesaurus reference against a true/false decision-tree structure. Such systems are readily confounded when data values (e.g. job titles) are not found within their internal dictionaries or when data is presented in unexpected formats. To work around these limitations, such solutions significantly restrict the number of data elements extracted only to those most easily retrieved from data lists or with fixed, hard-coded rules (e.g. job titles, locations, employer names).

Burning Glass addresses this issue with leading edge, patented solutions for extracting, normalizing, and coding a broad range of information from each job description. We have developed an alternative approach to conventional rules-based parsing using a branch of artificial intelligence called Statistical Natural Language Processing (SNLP). This pattern-matching technology analyzes word patterns statistically to represent concepts and functions independently based on context and comparison with past observation, thus avoiding the disadvantages cited above.

SNLP enables extraction of a comprehensive array of information directly from the job postings we retrieve from across the Internet, regardless of their format. From the extracted text, our advanced parsing engine codes and normalizes more than 70 data elements from any given posting, including:

  • Job function (O*NET)
  • Employer industry (NAICS)
  • Location (geo-coordinates, MSA, LMA)
  • Educational requirement (degree, level, major)
  • Source
  • Common and Specialized Skills (using a multi-level, hierarchical and fully customizable skills dictionary of over 15,000 skills)
  • Duration and level of experience
  • Plurality (i.e. does this represent just one job?)
  • Normalized salary
  • Intermediation (i.e. was this posted by a recruiter?)
  • Required certifications or licenses
  • Green? (based upon actual skills or work activities referenced in the job ad)

 

The ability not only to configure deduplication parameters but also to set multivariate conditions represents an unmatched degree of flexibility and sensitivity to the disparate data analysis contexts within which different users operate. In addition, it provides clients with the ability to reflect more accurately real world labor market behaviors by making different assumptions for different co-occurrence conditions. For example, a user can specify that two identical low salary-level positions from websites for different cities represent different positions while two identical high salary-level positions from websites for different cities are assumed to be listings for the same job. Burning Glass provides clients with easy to use controls for setting and applying as many or as few condition-based rules as they desire.

Accessibility of Underlying Data

Unlike other solutions, license to Burning Glass’s jobs data includes full access to the underlying content.

Burning Glass does not believe that aggregated statistics, on their own, represent a sufficient basis for labor market analysis. Instead, comprehensive analysis often requires examination of the underlying jobs in order to provide a robust understanding of market demand. As a result, and unlike other solutions, license to our jobs data includes full access to the underlying content. We do not restrict licensees from viewing or presenting the jobs we spider. This is important not only because of the scope of analysis it facilitates but also because of the data integrity it ensures. Over time, aggregation and cleanup methods may evolve. Without access to the underlying data, there is no ability for analysts to apply new routines to historical data in order to render high integrity trend mapping. Access to the underlying data also provides clients with flexibility in defining their own protocols for aggregation and non-duplication.

Burning Glass Others
We collect jobs from thousands of employers, educational institutions, public agencies, etc., not just job boards, thus providing more robust representation of hiring across the whole economy, including small businesses. Most jobs are local and more than 50% of recent job activity is attributable to small businesses which tend not to advertise. Visiting their websites is time and resource intensive but it pays off. Other spidering companies tend to rely predominantly on content they pick up from major job boards, thereby excluding small businesses and other employers who elect not to post on the major boards due to expense or other reasons and creating potential bias in their data.
We actively search for and “pull” jobs rather than relying on employers to “push” them by posting to a job board or labor exchange Job exchanges are passive recipients of jobs – that is, they publish the jobs that employers send them. While these are useful, they are not comprehensive job repositories for the simple reason that only a small fraction of employers make the effort of listing their jobs with an exchange. Job boards typically follow the same model (i.e. passive receipt), but their scope is even more limited because they charge a fee for posting. Job board aggregators are also limited because they only publish jobs from sites to which they have specific linkage.
We provide access to the underlying job listings, not just summary reports. Having access to actual job content is critical for job seekers, workforce developers, LMI analysts, and others who need detailed information as a basis for important decisions and future efforts. Summaries provide limited information and are thus of limited value for anyone who needs to know about skills and other requirements, not just job titles. Others provide only summary statistics on hiring activity and do not provide access to the job listings themselves. This seriously limits the usability of the data for job seekers, workforce developers, or economic developers who need to know exactly which jobs to target. Similarly, absent raw data, LMI analysts are unable to apply their own deduplication logic or coding routines.
Leveraging a field of artificial intelligence known as Statistical Natural Language Processing (SNLP), our patented technology has parsed millions of resumes and jobs, actually learning from the way each one was written and structured. This sophisticated, dynamic process enables our system to derive and infer as well as extract data, generating 70+ coded data elements for each resume and job listing. In addition, for each reported skill, our system understands how long, how recently, and in what context (e.g. in a job, a course, etc.) it was used. This skills-level intelligence dramatically expands the opportunities for actionable, real-time labor market insight – it’s the difference between attracting an employer to the state by saying “We have 525 people in the Boston area looking for work right now who have the exact machining skill you need” vs. “We have lots of folks who have worked in manufacturing”. No other job feed provides as many data elements or as much enriched data. And other job feeds are limited to NAICS (industry-level) and O*NET (occupation-level) coding, which means they cannot provide data at the level of skills, education, qualification, required experience level, etc.

 

 

 

Want to know more?

Your email address: