AI Resume Screening Bias in 2026: A 33,000-Job Audit

AI resume screening bias is now documented at industrial scale. In the most-cited 2024 academic study, language-model-based resume rankers preferred white-associated names 85.1% of the time and Black-associated names just 8.6% across roughly 40,000 paired comparisons (Wilson & Caliskan, AIES 2024). Male-associated names were favored 51.9% to 11.1%. In head-to-head comparisons of Black men against white men, Black men were chosen in zero out of three model tests. Not “some bias.” Functional exclusion.

Pin audited the recruiting funnel that sits one step earlier in the process. Between January 2024 and May 2026, 33,000+ jobs and 37,000+ recruiter sourcing searches ran through our platform. In that corpus, employer-prestige filters appear in 70.7% of searches. Years-of-experience floors of 5 or more years account for 51.9% of all experience-gated searches. A single 12-month tenure default sits underneath roughly 96% of all tenure floors. So bias does not start at the ATS. Bias starts at the sourcing screen.

This article does three things. First, it lays out what the academic literature actually proves about AI resume screening bias in 2026, not what the marketing pages say. Second, it documents the legal architecture the EEOC, EU AI Act, NYC Local Law 144, Illinois HB3773, and Texas TRAIGA built around AI hiring tools in the last 24 months. Third, it shows what Pin’s own review of 33,000+ jobs reveals about how filter defaults quietly do most of the screening work before any AI model is asked to rank a single resume. Note: Pin numbers reflect sourcing-side recruiter behavior, not inbound apply rates. We flag the funnel difference every time it matters.

How We Audited 33,000+ Jobs and 37,000+ Sourcing Searches

Every active, non-test sourcing search created on Pin’s platform between January 1, 2024 and May 19, 2026 was eligible, after removing internal orgs, demo accounts, and obvious test rows. With those filters applied, 37,000+ recruiter sourcing searches remained, spanning 33,000+ unique open jobs across every industry Pin serves: tech, healthcare, life sciences, finance, professional services, and staffing agency books.

Pin counts what recruiters do when narrowing a candidate pool. Which filters they apply. Where they set the floor on years of experience. Whether they restrict by prior employer. And whether the job has a full job description (500+ characters) or a brief set of requirements (under 500 characters). This review does not measure applicant behavior. Nor does it measure who applied to a job board. What it measures is recruiter-driven screening decisions one step before any ATS or AI screener ever sees the resume.

For employment-gap prevalence, we ran a separate sample of 5,000,000+ candidate profiles drawn from Pin’s index. A gap was defined strictly: a documented 6+ month blank between two explicitly dated roles in the past five years. Conservative by design. Many real career gaps are not captured this way because candidates omit roles or leave dates blank entirely.

Scope Versus the UW Academic Study

Our review is not the University of Washington study. That paper is the academic anchor we extend. Where UW researchers tested embedding models in a controlled retrieval experiment, Pin observes what actually happens in production sourcing tools recruiters use every day. Where two findings converge, the convergence is evidence. Where they diverge, this article names the difference.

The short version:

  • AI resume screeners are documented to favor white-coded names 85.1% of the time and Black male names 0% of the time in head-to-head comparison with white male names (Wilson & Caliskan, AIES 2024). A 2025 follow-up using 3 million+ comparisons replicated the pattern and named the “Illusion of Neutrality” effect, where seemingly unbiased models match on keywords rather than substantive evaluation (Fairness Is Not Enough, arXiv 2025).
  • 70.7% of Pin’s 37,000+ audited recruiter sourcing searches apply an employer-prestige filter. That single filter, more than any other, narrows who is allowed to be seen before any AI scores a resume. Next on the list of bias-prone filters is a years-of-experience floor (45.7% of searches).
  • 96% of all minimum-tenure filters are set at exactly 12 months at the candidate’s current employer. The default, not the recruiter’s individual judgment, is doing the screening. EEOC adverse-impact analysis (the four-fifths rule from the 1978 Uniform Guidelines) treats facially neutral defaults as actionable when their outcomes disadvantage protected groups (EEOC Uniform Guidelines, 29 CFR Part 1607).
  • 51.9% of experience-gated searches require 5+ years; 12.4% require 10+. Inflated YoE floors are a textbook adverse-impact pattern. SHRM’s 2026 State of AI in HR survey of 1,908 HR professionals found that 19% of organizations using hiring automation report their tools have screened out qualified applicants (SHRM, 2026).
  • The legal architecture tightened materially. NYC Local Law 144 enforcement, the EEOC’s iTutorGroup settlement, Mobley v. Workday’s ADEA certification, Illinois HB3773, Texas TRAIGA, and the EU AI Act Annex III (fines up to €35M or 7% of global revenue) all now sit in the path of any team using AI to screen resumes (NYS Comptroller audit, 2025).
85.1%
of LLM resume rankings favored white-associated names; Black names were favored just 8.6%
Wilson & Caliskan, AIES 2024
70.7%
of 37,000+ recruiter sourcing searches apply an employer-prestige filter before any AI scores a candidate
Pin audit, 2026
96%
of all minimum-tenure filters in the audit are set at exactly 12 months. The default does the screening.
Pin audit, 2026

What the 2024 UW Study Actually Found

Language-model resume rankers preferred white-associated names in 85.1% of paired comparisons. Black male names won 0% of head-to-head tests against white male names. Researchers ran roughly 40,000 paired comparisons across three open-weight embedding models. The most-cited academic anchor on this finding is Kyra Wilson and Aylin Caliskan’s “Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval,” presented at AIES 2024 with NIST funding (arXiv preprint; Brookings writeup). For practitioners weighing real-world deployments, the picture matters because the embedding models tested are the same family used inside production AI tools that screen resumes. Methodology is worth reading carefully because most secondary coverage gets it wrong.

Wilson and Caliskan took 554 real resumes and augmented them with 80 first names: 20 names per demographic group (Black women, Black men, white women, white men), with frequencies drawn from the U.S. Census. To isolate the first-name signal, they held the last name “Williams” constant across every resume. Ranking candidates against 571 job descriptions across nine occupations, they ran three open-weight embedding models: E5-Mistral-7B-Instruct, GritLM-7B, and SFR-Embedding-Mistral. About 40,000 paired resume-vs-JD comparisons ran per model.

Headline findings, replicated across all three models:

  • Race. White-associated names were ranked higher in 85.1% of all paired comparisons. Black-associated names were preferred in only 8.6%. Both groups tied in 6.3% of tests.
  • Gender. Male-associated names were preferred in 51.9% of comparisons. Female-associated names in 11.1%. Equal selection rates appeared in 37% of tests.
  • Intersectionality. When the comparison was Black male names versus white male names, Black male names were preferred in 0% of tests across all three models. This is the single most extreme finding in the paper. Black women fared better than Black men (~85% versus ~14.8%), suggesting that the interaction of race and gender signals in embedding space penalizes Black men disproportionately, an effect documented in the broader algorithmic fairness literature but rarely surfaced in HR coverage.

These rates would not survive an EEOC adverse-impact analysis. The 1978 Uniform Guidelines on Employee Selection Procedures define adverse impact as any group’s selection rate falling below four-fifths (80%) of the rate for the highest group. Black names’ 8.6% rate divided by white names’ 85.1% rate produces a ratio of 10.1%. That is far below the 80% threshold, by a factor of about eight.

Wilson and Caliskan are not the only researchers to find this pattern. A 2025 follow-up titled “Fairness Is Not Enough” ran more than 3 million comparisons and replicated the 85.1% white-preference figure. It also identified the “Illusion of Neutrality” effect: some models that show statistically flat racial bias do so because they match on superficial keywords rather than substantive resume content. Apparent fairness can mask the absence of screening quality entirely. In April 2025, a separate Chaturvedi et al. paper ran 332,044 real job postings through mid-sized LLMs. The result: “most models favor men, especially for higher-wage roles,” with occupational segregation patterns reproduced by the model. Bertrand and Mullainathan’s original field experiment (NBER WP 9873, 2003) found a 50% interview-callback gap between white-coded and Black-coded names on real, human-reviewed resumes. So the AI version of that gap is not new behavior. It is human bias, retrained at scale.

For a general primer on how algorithmic bias enters AI systems beyond the hiring context, IBM Technology’s Martin Keen walks through the causes, real-world examples, and mitigation patterns that apply to any screening model.

Algorithmic Bias in AI: What It Is and How to Fix It (IBM Technology)

How Recruiters Reproduce the Same Bias When Sourcing

Bias enters the funnel one step before any AI ranker sees a resume. While the UW study measures what happens once a resume is in front of a model, Pin’s audit of 33,000+ jobs answers a different question: which resumes ever get there. The findings sit alongside what we cover in our guide to automated screening for candidates, where the same default-driven mechanics show up at the next funnel stage.

Across 37,000+ recruiter sourcing searches in Pin’s platform, the most common filter is not skills, not titles, not certifications. It is employer prestige. 70.7% of all sourcing searches restrict the candidate pool to people who have worked at a specific named company or set of companies. This is the recruiter version of the “graduate of a top-10 school” filter, applied not to education but to prior employer. Without time at a recognized brand-name employer, a candidate is filtered out before any AI screener ever sees their resume. Companion filters look similar. 54.9% restrict by seniority level. 45.7% set a years-of-experience floor. 30.9% filter by college major. 2.5% explicitly filter by graduation year, which is an age-proxy filter that runs straight into the EEOC’s ADEA jurisdiction.

A second pattern is even cleaner. Among recruiter sourcing searches that set a minimum tenure floor at the candidate’s current employer, 96% set the floor at exactly 12 months. Next most common is 24 months, which accounts for less than 1% of tenure filters. Recruiters did not arrive at the 12-month default through careful deliberation about what is reasonable. It is what the platform’s default value happens to be. A candidate returning from parental leave at month 10, switching companies after a 9-month contract, or recovering from a layoff at month 8 is filtered out before any AI model is asked anything. Under EEOC adverse impact analysis, facially neutral defaults are actionable when their effects fall disproportionately on protected classes (EEOC May 2023 technical assistance). Pin’s 12-month tenure default sits squarely in that category.

Horizontal bar chart of Pin sourcing-search filter rates by filter type

Years-of-experience floors are the next problem. Across the 15,000+ searches that set a YoE floor at all, 51.9% require 5 or more years and 12.4% require 10 or more. Median floors by function tell the same story: 5 years for engineering and finance, 4 years for sales, 10 years for executive leadership. For at least three decades, the EEOC and OFCCP have flagged inflated experience floors as a facially neutral practice with documented adverse impact on younger workers and career-changers. On a sourcing screen, the result is structural. Took a non-linear path? Did contract work for two years? Bootcamp instead of a CS degree? Returned from a career break? Filtered out before any AI ranker is asked anything.

All of this matters for the broader debate about AI resume screening bias because much of the public discussion is structured as if the AI is the source of the problem. Our data says the AI inherits a pool that human-designed filters have already narrowed. Identical-resume tests by Wilson and Caliskan show that AI screeners then layer additional bias on top. Both are real. Pin’s review speaks to the first half of the funnel. UW’s study speaks to the second.

The Compounding Filter Problem: Brief Requirements vs Full Job Descriptions

One finding in our data surprised us is what happens to filter rates when a sourcing search is backed by a full job description versus a brief set of requirements. We split the 37,000+ searches into three groups: full JD (500+ characters of description text), brief requirements (under 500 characters), and no description at all.

Search backingEmployer-prestige filterYoE floorCollege major filter
Full JD (500+ chars)74.8%49.5%35.7%
Brief requirements (<500 chars)49.0%35.4%9.7%
No description at all77.0%44.9%32.2%

A counterintuitive result sits in the bottom row. When recruiters have no job description at all, employer-prestige filtering is actually higher (77%) than when they have a full JD (74.8%). As expected without semantic context, the mechanism is straightforward. With nothing to match candidates against on substance, the search defaults to pedigree. Where the JD exists and is substantial, filters do not disappear, but they ease by a few points. Where the JD is brief, filters ease more meaningfully across all three categories.

That observation has a practical implication. A team trying to broaden a search and reduce its dependence on pedigree filters should write a complete job description, not skip it. A blank requirements box does not free the search from filters. Instead, it hands the screening job to whichever heuristic the recruiter reaches for. And the most accessible heuristic in 2026 is still the employer name. Both directions matter here: AI sourcing tools and AI rankers work better when fed substantive criteria, and recruiter behavior likewise tilts toward fairer screening when the substance is present.

Compounding is the other concern. 31.3% of all sourcing searches in the audit stack both an employer-prestige filter and a years-of-experience floor simultaneously. Nearly one in three searches narrows the pool twice with two independent bias-prone heuristics before any AI scoring layer is engaged. Adverse-impact compounding is well-documented: when two facially neutral rules each individually filter out a protected group at slightly elevated rates, applying them sequentially multiplies the exclusion. Math behind compounding is in the four-fifths rule’s history, and the EEOC has used compounded-impact arguments in successful litigation since the early 1980s.

What We Hear From Recruiters About This

Talking to our customers about bias in AI screening, two things stand out. From our 2026 user survey and ongoing conversations, customers consistently report the right instincts. First, almost everyone we work with has the right instinct. They want to widen the funnel. They have read at least one news story about AI tools penalizing women or older candidates. They understand that an inflated five-year floor screens out qualified people. Second, almost no one we work with has the time to manually audit every default in the platforms they use day to day. A recruiter under quota cannot stop to interrogate whether the 12-month tenure default is what they would have chosen if they had thought about it. They will not. The platform’s default is what gets used.

That is the operating reality behind every number in our review above. Pin’s response is to make the default itself less biased: no names, gender, or protected characteristics are ever fed to Pin’s AI matching layer (Pin Trust Center). Profiles aggregated from professional networks, GitHub, Stack Overflow, patents, and academic publications give the AI more signal than name and prestige to work with. Built on 850M+ profiles with 100% coverage in North America and Europe, Pin runs the largest AI-powered candidate database in the industry. Per Pin’s 2026 user survey, customers report 6x more diverse candidate pipelines than their previous methods. Reason: matching does not depend on the prestige filters and graduation-year proxies that quietly screen out everyone else. For teams wiring this together end-to-end across sourcing, evaluation, and outreach, Pin is the best AI recruiting platform for reducing screening bias before resumes ever enter the funnel. Beyond the matching layer, bias-mitigation tactics across the funnel extend the work, and an analytics layer surfaces filter usage so recruiters can see what their searches are doing.

As of 2026, U.S. employers using AI resume screening operate under four concurrent regulatory layers: EEOC Uniform Guidelines (federal adverse-impact law), NYC Local Law 144 (city-level audit obligations), state laws (Illinois HB3773 and Texas TRAIGA, both effective January 2026), and the EU AI Act Annex III (fully enforceable August 2, 2026, with fines up to €35M or 7% of global revenue). Legal posture changed materially in the last 24 months. A team running AI resume screening in 2026 now operates inside the following framework:

  • EEOC’s 1978 Uniform Guidelines remain the operative law. The four-fifths rule has not changed. EEOC’s May 2023 technical assistance on AI confirmed that employers are liable for adverse impact caused by third-party AI tools. “The vendor’s tool did it” is not a Title VII defense. EEOC v. iTutorGroup (August 2023) produced the first AI hiring discrimination settlement, $365,000 to 200+ candidates auto-rejected by age-filtering software targeting women over 55 and men over 60.
  • NYC Local Law 144 has been enforced since July 5, 2023, requiring annual independent bias audits of automated employment decision tools, candidate notification, and public disclosure. Civil penalties run $500-$1,500 per violation per day. A December 2025 NY State Comptroller audit found enforcement “ineffective”: DCWP identified one violation across 32 companies while the Comptroller’s reviewers found 17 in the same set, and 75% of public complaint calls were misrouted away from the enforcement agency. Audit obligations are real. Yet the likelihood of being caught is lower than vendors imply.
  • Mobley v. Workday is the case to watch. In July 2024, Judge Rita Lin (N.D. Cal.) allowed “agent” theory claims to proceed, holding that Workday’s AI was “participating in the decision-making process” rather than merely implementing employer criteria. In May 2025, the court granted preliminary ADEA collective certification, accepting the disparate impact theory (Fisher Phillips summary). If it proceeds to merits, it will be the first substantial precedent on AI vendor liability under federal employment law.
  • Illinois HB3773 (effective January 1, 2026) requires employer notice when AI is used in hiring, prohibits ZIP codes as protected-class proxies, and imposes four-year recordkeeping. Texas TRAIGA (also January 1, 2026) prohibits AI deployed “with intent” to discriminate, with penalties of $10,000-$200,000 per violation enforced by the Texas AG. Disparate impact without intent does not violate TRAIGA. There is no private right of action.
  • The EU AI Act Annex III is the largest piece. Employment AI is classified as high-risk, and full obligations are enforceable from August 2, 2026: risk assessments, technical documentation, bias testing, human oversight, transparency, continuous monitoring. Fines reach €15M or 3% of global turnover for non-compliance and €35M or 7% for prohibited practices. The law applies based on where the candidate is, not where the employer is headquartered. A US employer running AI screening on a single Berlin-based applicant is in scope. The EU AI Act obligations for recruiting layer on top of GDPR rules for candidate data, which govern the underlying flows.

In 2026, the picture is no longer the EEOC alone. It is EEOC plus city law plus state law plus federal litigation plus EU regulation, all converging on the same activity.

UCLA’s Institute for Technology, Law & Policy offers a short academic framing of where algorithmic decision systems break. Useful context for any team building an audit log to meet the obligations above.

AI & Bias: When Algorithms Don't Work (UCLA Institute for Technology, Law & Policy)

What Vendor Bias Audits Actually Say (And Don’t)

Only Workday has published a substantive NYC Local Law 144 bias audit for its HiredScore Spotlight matching system; iCIMS earned TrustArc’s Responsible AI Certification in March 2025. Neither vendor publishes testable pass-rate or impact ratio data publicly, and Greenhouse, Lever, BambooHR, JazzHR, Manatal, Workable, and Taleo have no comparable published audits at all. The contents of the available disclosures are worth reading carefully.

Workday published a bias audit of its HiredScore Spotlight matching system covering September 2025 through February 2026 in the NYC area, conducted by independent auditor Secretariat. Its summary statement reads, “No evidence of disparate impact based on calculated impact ratios” (Workday Responsible AI page). The actual impact ratio table is referenced in the audit but not reproduced on the public-facing page. The audit covers five job profiles in a limited geography over a six-month window. Read in context, the disclosure satisfies the letter of NYC Local Law 144’s audit publication requirement but does not give a third-party analyst enough to independently verify the claim.

iCIMS became the first enterprise recruiting software to earn TrustArc’s Responsible AI Certification in March 2025, with annual bias audits, an AI Governance Committee, and Privacy by Design reviews. The certification is meaningful in process terms. The pass-rate numbers and impact ratios from the audits are not publicly disclosed.

A literature search for published bias audits of Greenhouse, Lever, BambooHR, JazzHR, Manatal, Workable, and Taleo returned nothing comparable to Workday’s NYC LL144 filing. Mobley is litigation, not an audit. SHRM’s 2026 State of AI in HR report found that 27% of organizations now use AI in recruiting, the highest adoption rate of any HR function. Within the same survey, 19% of organizations using hiring automation report their tools have screened out qualified candidates (SHRM, 2026). In effect, that 19% is a self-reported false-negative rate. Either the underlying screening AI is missing real talent, or recruiters are catching it themselves and reporting it to surveyors. Either way it is a number the entire category should be paying closer attention to.

Across the category, vendor transparency on screening bias remains thin. Most vendors publish responsible-AI principles. Few publish testable pass-rate data. The EEOC’s adverse-impact analysis requires testable pass-rate data. That gap is where most of the next two years of AI hiring litigation is going to live.

How to Reduce AI Resume Screening Bias Across the Funnel

Six controls do most of the work for AI resume screening bias in 2026. Strip identifying information before AI ranking. Review default filter settings quarterly. Require a complete job description. Pick screening vendors that publish impact ratios. Document every AI-driven decision. Track diversity outcomes at the source stage rather than only at hire. Together, these controls map to the funnel itself: who gets into the candidate pool, who survives recruiter screening filters, who survives AI ranking, and who reaches the human reviewer. Points of intervention sit at each stage.

  1. Strip identifying information before AI ranking. Names, photos, and graduation years are the strongest proxies for race, gender, and age in resume data. Any AI tools that screen resumes should default to processing candidates without these fields, or with them masked. Pin’s matching layer is built this way: no demographic data, no names, no protected characteristics are fed to the AI at any point.
  2. Audit your default filter settings. The 12-month tenure default. The 5-year YoE floor. The “FAANG or equivalent” prior-employer filter. Each of these is an EEOC adverse-impact argument waiting to be made. Set a calendar reminder once per quarter to look at what your sourcing tool defaults to and ask whether you would have chosen those defaults if you had thought about them.
  3. Require a real job description, not a brief. The audit data shows that brief requirements and missing JDs increase reliance on pedigree filters. Without a JD to work from, the path of least resistance is the prior-employer filter. Writing the JD is, mechanically, a bias-reduction move.
  4. Use a screening vendor that publishes its impact ratios. Vendor certifications and responsible-AI pages are necessary but not sufficient. The NYC LL144 disclosure model, even with its current enforcement gaps, is the floor: a vendor that cannot reproduce its own impact ratio table is a vendor that cannot prove its system passes the four-fifths rule.
  5. Document every AI-driven decision. Illinois HB3773 already requires four-year recordkeeping. The EU AI Act requires more. Logging which candidates were surfaced, ranked, filtered, and why is both a compliance requirement and a debugging tool. Through those logs, you discover that your tenure default is doing the work you thought your AI was doing.
  6. Track diversity outcomes at the source stage, not just the hire stage. Hiring outcomes show what passed every filter. Source-stage outcomes show what your filters allowed in. Between the two sits where bias lives. Pin’s analytics surface this view by default because it is where most teams find their largest unaddressed problem.

Frequently Asked Questions

Is AI resume screening biased in 2026?

Yes, by published academic measurement. The most-cited 2024 study found language-model-based resume rankers preferred white-associated names 85.1% of the time and Black male names 0% in head-to-head tests against white male names (Wilson & Caliskan, AIES 2024). A 2025 replication using 3 million+ comparisons reproduced the pattern. Pin’s audit of 33,000+ jobs adds that bias also enters the funnel one step earlier, where 70.7% of recruiter searches apply employer-prestige filters before any AI ranker sees a resume.

How does AI resume screening discriminate?

Through two documented mechanisms. First, training data: AI rankers learn from historical hiring decisions that reflect human bias. Second, embedding-space artifacts: name signals interact with seniority and skill signals in ways that produce intersectional penalties, most severely for Black men (0% selection rate against white men across all three models in the UW study). Pin’s audit data shows a third mechanism upstream of the AI: recruiter filter defaults narrow the pool before any AI scores anything.

Is AI in hiring illegal?

Not illegal per se. Regulated. EEOC’s 1978 Uniform Guidelines apply the four-fifths rule to AI selection tools just as to any other selection device. NYC Local Law 144, Illinois HB3773, Texas TRAIGA, and the EU AI Act Annex III all impose specific obligations on AI used in employment decisions. In practice, the legal question is whether the specific tool, on the specific data, in the specific jurisdiction, produces adverse impact and complies with the documentation, review, and disclosure obligations that apply.

What is the four-fifths rule?

Codified in 1978 by the EEOC, the four-fifths rule is the federal standard for identifying adverse impact in employee selection procedures. Selection rates below 80% of the highest-selected group’s rate are generally regarded as evidence of adverse impact and trigger a Title VII analysis (EEOC Q&A on the Uniform Guidelines). The rule applies to AI screening tools the same way it applies to written tests.

Can recruiters use AI without bias risk?

Risk cannot be reduced to zero, but it can be managed. The audit-and-document approach is what the EEOC, EU AI Act, and NYC LL144 all converge on. Select tools that publish impact ratios. Review defaults quarterly. Strip demographic signals from inputs. Log every screening decision. Track diversity outcomes at the source stage rather than the hire stage.

Reducing AI resume screening bias in 2026 is no longer optional. The academic evidence is clear, the regulatory architecture is in place, and the operational fix is well understood. Teams that get ahead of it are those whose sourcing tools, screening logic, and audit logs were designed for the 2026 legal posture from the start. Pin is built for that posture, and the audit numbers above are why.