Applying Exponential Family Embeddings in Natural Language Processing to Analyze Text

1:55 - 2:25 | Friday, May 24


 Many data scientists are familiar with word embedding models such as word2vec, which capture semantic similarity of words in a large corpus. However, word embeddings are limited in their ability to interrogate a corpus alongside other context or over time. Moreover, word embedding models either need significant amounts of data, or tuning through transfer learning of a domain-specific vocabulary that is unique to most commercial applications.

In this talk, Maryam will introduce exponential family embeddings. Developed by Rudolph and Blei, these methods extend the idea of word embeddings to other types of high-dimensional data. She will demonstrate how they can be used to conduct advanced topic modeling on datasets that are medium-sized, which are specialized enough to require significant modifications of a word2vec model and contain more general data types (including categorical, count, continuous). Maryam will discuss how we implemented a dynamic embedding model using Tensor Flow and our proprietary corpus of job descriptions. Using both categorical and natural language data associated with jobs, we charted the development of different skill sets over the last 3 years. Maryam will specifically focus the description of results on how tech and data science skill sets have developed, grown and pollinated other types of jobs over time.

Key takeaways: (1) Lessons learnt from implementing different word embedding methods (from pertained to custom); (2) How to map trends from a combination of natural language and structured data; (3) How data science skills have varied across industries, functions and over time.

Maryam Jahanshahi
Research Scientist, TapRecruit

Maryam runs research at TapRecruit, a startup that is building software tools to implement evidence-based talent management. TapRecruit’s research program integrate recent advances in NLP, data science and decision science to identify robust methods to reduce bias in talent decision-making and attract more qualified and diverse candidate pools. In a past life, Maryam was a cancer biologist and a data journalist. She holds a PhD from the Icahn School of Medicine at Mount Sinai.