Analysing CVs with a local LLM
Some time ago I decided to test services that offer to review your CV (some for free). The results were shockingly bad. The "analyses" were LLM generated and full of errors. Just a few examples (reportedly written by a "Resume Expert"):
- "you should always add your email address AND phone number" (they were on the second line, clearly visible),
- "I didn’t find a professional summary in your resume" (it was the first paragraph),
- "the order of your listed positions is not as it should be; make sure to chronologically sort your work experience; your most recent work experience should come first" (the positions in the resume were listed exactly as they wanted them to be).
The review was clearly LLM generated (and several pages long with a lot of blah blah) even though stylised as though written by a real person. I have no idea why they don't even check the LLM output for obvious and very stupid errors, but it made me curious about how hard it is to accurately analyse a CV with a LLM. To make the experiment even more interesting I decided to use a relatively small local LLM (Apple's 3B foundation model).
CVs are usually produced in PDF format (or can be converted into PDF format). Luckily it is very simple to extract text from a PDF file using PDFKit:
guard let document = PDFDocument(url: url) else { print("failed to open PDF file"); return }
var text = ""
for i in 0..<document.pageCount {
if let page = document.page(at: i) {
if let pageText = page.string {
text.append(pageText)
} else {
print("failed to get page text (\(i))")
return
}
}
}
The extracted text can be easily analysed using the on-device LLM as described in this article. The LLM can be made to generate a "native" Swift object (marked by the @Generable macro). I used the following structure to extract the basic info from a CV.
@Generable
struct CV {
@Guide(description: "The person's full name.")
let name: String
let email: String?
let phoneNumber: String?
let summary: String?
let technologies: [String]
let skills: [String]
let languages: [String]
@Guide(description: "Hobbies and interests.")
let interests: [String]
}
An instance of this structure can be created by the LLM as follows:
let session = LanguageModelSession(model: model) let options = GenerationOptions(sampling: .greedy, temperature: 0.5) let response = try await session.respond(to: prompt, generating: CV.self, options: options) let cv = response.content
The cv variable now contains an instance of CV. I ran the code and all fields were extracted correctly.
Let's now look at education. The following structure might work:
@Generable
struct Education {
let title: String
let institution: String
let location: String
let from: DateInfo
let to: DateInfo
let details: [String]
}
I've used a property defined as let education: [Education] in the CV structure. Again, all information was extracted correctly.
As for work experience, the following structure (the corresponding property in the CV structure is let worksExperience: [WorkExperience]) did the job flawlessly:
@Generable
struct WorkExperience {
let title: String
let from: DateInfo
let to: DateInfo
let details: [String]
}
Needless to say the order of the extracted items in the structure was the same as in the CV. Generally this is how LLMs work but to check whether the order is correct one shouldn't ask the LLM whether it is but rather extract the list and then check whether it's correctly ordered.
The details property contained a correctly structured list of points for every position and could be analysed further (for example, for a list of skills and/or technologies used).
I can only hope that serious recruiters use better tools to analyse candidates' CVs than the one mentioned in the introduction to this post. It's really not that hard.
Studied physics & CS; PhD in NLP; interested in AI, HPC & PLT
Loading discussion...
Hey! 👋
Got something to say?
or to leave a comment.