semantic job recommendation engine
TRANSCRIPT
![Page 1: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/1.jpg)
JobRecommendation Engine
HarshendraMahak GambhirVishal Gupta
![Page 2: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/2.jpg)
Introduction
Most job applications are through resumes
Short Listing resumes is a painful task
Especially when there are thousands of them
Manual checking is not feasible
![Page 3: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/3.jpg)
Semantic Search
What is not a semantic search ?
Search based on keywords ?
Ex: grep “icpc”
Can we make it better ?Weightage to sections
To be able to search in a particular section
Recommend resumes similar to a particular resume
NLP vs Natural Language Processing
![Page 4: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/4.jpg)
Challenges
Structure of a resume
Content Segregation - How well is it written ?
Extraction of entitiesNames
Marks/Grades
Previous Organizations
Years of experience
Skills
Projects
Ranking among the shortlisted resumes
![Page 5: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/5.jpg)
Problems addressed
Creation of info box for each resume.Name: XYZ
Email : [email protected]
Phone: xx-xxxxxxxxx
Skills
Projects
A search interfaceQuery> Machine Learning, ACM ICPC, Scikit Learn
Recommend resumes similar to a given resumeResume Path> /path/to/resume
![Page 6: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/6.jpg)
The Process Flow
Resumes in
PDF format
PDF to text
Extract features
Ranking Model
Top K Candidates
Resumes in
Text formatPreprocessing
Parsing Infoboxes
JSON files
& Indexes
Used to display a brief summary of the candidate.
![Page 7: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/7.jpg)
Approach
Each resume is a text document
Extract the sections in each resume
NER for names
Index the resumes according to sections
Extract features from each resume
![Page 8: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/8.jpg)
Technical DetailsNamed Entity Recognition for extracting namesUsed Stanford NER
Parse the sections using most common section names
Project/Projects/Academic Project/Major Projects/Minor Projects
Skills/Technical Skills/Technical Expertise and so on
Stanford temporal tagger for extracting the temporal information
Jan 2012 - Jan 2014 (2 years)
Useful in extracting the years of experience
![Page 9: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/9.jpg)
Technical DetailsBag of words approach
Vector space model
Extract feature for each resumeBoolean
A term is present or not d
Tf-Idf
Tf-idf weight of a term w.r.t the document d
Rank the resumes based on distance functionsCosine Similarity
Euclidean distance
![Page 10: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/10.jpg)
FeaturesLet N be the |V|, d1 and d2 be two resumes
d1: v1 <v11 v12 . . . v1N>d2: v2 <v21 v22 . . . v2N>where,vij: Tf-Idf(di,t j)tj is the jth resume
Tf-Idf(d,t) = tft x log(N/dft)
tft - Term frequency of term t in document ddft - document frequency of term t
![Page 11: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/11.jpg)
RankingConvert the test query/document into tf-idf feature
vector tEx: test = “acm icpc, machine learning”
Calculate cosine similarity with each document and rank themcos-sim(d1,t), cos-sim(d2,t) and so on
Rank the resumes in non-decreasing orderChoose top K
![Page 12: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/12.jpg)
Tools/Frameworks
Language: python 2.7
Frameworks: Scikit-learn
Nltk
Stanford NER tagger
Stanford temporal tagger
Intuition!
![Page 13: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/13.jpg)
Demo
![Page 14: Semantic job recommendation engine](https://reader036.vdocuments.site/reader036/viewer/2022062503/587bf6221a28ab7c668b4a1f/html5/thumbnails/14.jpg)
Links
Code: https://github.com/gvishal/Sematic-Job-Recommendation-Engine
Deliverables: https://www.dropbox.com/home/IRE_Major_Project/deliverables
Presentation: http://www.slideshare.net/vishalg2/semantic-job-recommendation-engine
Web Page: http://mahakgambhir.github.io/Sematic-Job-Recommendation-Engine/
Video: https://www.youtube.com/watch?v=EFiOd4-i2FM