It started with a PDF dropped into our clinical team’s Slack channel. Jordan Wright, our Chief Clinical Officer at Parallel Learning, shared a systematic review concluding that the dominant method for identifying learning disabilities in American schools performs “at the level of chance.” The conversation that followed, between clinical managers, speech-language pathologists, and an engineer who kept pushing for a clean algorithm, turned into the most productive diagnostic argument I’ve witnessed inside a company. This article is the longer version of that argument. It won’t end with a formula. For nearly fifty years, American schools have tried to reduce learning disability identification to arithmetic. Administer an IQ test, administer an achievement test, compute the gap. If the gap was large enough, the child had a Specific Learning Disability. If not, tough luck. This IQ-achievement discrepancy model was codified in 1977 under the Education for All Handicapped Children Act and dominated practice across all fifty states for three decades. Its appeal was administrative clarity. Its fatal flaw: it required children to fail badly enough, for long enough, to produce a statistically significant gap. Critics called it “wait to fail.” By the time a third or fourth grader’s scores diverged enough to cross the threshold, the most critical windows for early literacy intervention had already closed. The scientific case against discrepancy accumulated steadily. Children who met the criteria were largely indistinguishable from low achievers who didn’t, on reading measures, phonological processing, neuroimaging, and intervention response alike. An OSEP researcher roundtable concluded that “IQ-achievement discrepancy is neither necessary nor sufficient for identifying individuals with SLD.” The model measured baseline IQ. It said nothing about why a student was struggling or what to do about it. When Congress reauthorized IDEA in 2004, states were explicitly prohibited from requiring the discrepancy model. They were required to permit Response to Intervention approaches. And they were given a third option, “other alternative research-based procedures,” an opening that a new generation of formulaic models would walk through. The Pattern of Strengths and Weaknesses model emerged in the early 2000s as the discrepancy model’s more ambitious replacement. Where discrepancy asked one question (is the gap big enough?), PSW asked three: Does the student have a cognitive processing weakness? Does that weakness align with an academic deficit? Are other cognitive abilities relatively intact? Three major variants developed: the Concordance/Discordance Model, Cross-Battery Assessment, and the Dual Discrepancy/Consistency method. All shared the same premise. If you could identify which cognitive process was deficient, you could target interventions more precisely. The theory was appealing. The data have not cooperated. A 2025 systematic review, the one Jordan shared, concluded that PSW approaches identify SLD “at the level of chance.” The authors analyzed diagnostic accuracy across PSW methods, instruments, and data types (real and simulated) and recommended that practitioners “consider alternatives” because PSW “may not be worth the time or effort.” The psychometric problems run deep. Simulation studies found that PSW methods flag only 1 to 2 percent of the population as having an SLD, despite epidemiological evidence suggesting dyslexia alone affects 7 to 15 percent of students. The models create an artificial bottleneck. Not because learning disabilities are rare, but because the probability of simultaneously meeting multiple aligned score thresholds is mathematically tiny. Decision instability compounds the accuracy problem. When researchers applied the same PSW criteria using different achievement test batteries on the same students, classification decisions changed. The specific students flagged shifted depending on which tests the evaluator happened to select. Correlated instruments produce slightly different composite scores that fall on different sides of arbitrary thresholds. Positive agreement across batteries ranged from 0.33 to 0.59, unreliable enough to make eligibility decisions effectively arbitrary. And PSW classification doesn’t predict how students respond to intervention. In a direct test, LD status identified through both major PSW methods showed no association with differential treatment response to intensive reading instruction. Pretest reading scores already contained everything PSW was supposed to add. The cognitive profiling gatekept access to services without guiding what those services should be. I pushed Jordan on the reproducibility question. If not a statistical model, wouldn’t clinical judgment vary drastically from one practitioner to the next? His response reframed the problem: “Humans are messy and unfortunately not statistical. Identifying an SLD is actually more of a thesis and argument, not a right/wrong decision. Actuarial data support the argument, but shouldn’t be making the decision.” The reason algorithms keep failing is biological. Specific Learning Disabilities aren’t single-deficit conditions with clean cognitive signatures. Bruce Pennington’s multiple deficit model argues that neurodevelopmental disorders like dyslexia emerge probabilistically from multiple interacting risk factors: phonological processing, orthographic processing, rapid automatized naming, working memory, processing speed. No single necessary or sufficient cognitive cause. This shows up within each named subtype. Not every child with dyslexia struggles with the same thing. Some have phonological processing weaknesses. Others show relatively spared phonology but weak naming speed or poor fluency. The International Dyslexia Association’s 2025 revised definition, the first update since 2002, formally acknowledged this: “underlying difficulties with phonological and morphological processing are common but not universal.” Dyslexia exists “along a continuum of severity,” and its causes “involve combinations of genetic, neurobiological, and environmental influences that interact throughout development.” The same heterogeneity shows up in dyscalculia and dysgraphia, each encompassing multiple distinct deficit profiles that respond to different interventions. I tried the diabetes analogy. Two people diagnosed with diabetes go through similar interventions with stress-tested methods. Jordan’s correction was precise: “Not every kid with dyslexia is struggling with exactly the same thing, which may be qualitatively different than, say, diabetes. So two kids with dyslexia may not need the same intervention.” Medicine itself has moved away from binary cutoffs for heterogeneous conditions. Depression, pain disorders, hypertension are managed across continua with individualized thresholds. SLD diagnosis lacks even the biomarker specificity that makes medical thresholds defensible. Formulaic identification plays out as geographic lottery. IDEA’s “must not require / must permit / may permit” language left states and districts with remarkable latitude. The vast majority of states still allow the discrepancy model as an option. At least 35 states include PSW language as a permitted method. Policy reviews have found dozens of different discrepancy formulas within a single state, meaning the same student could qualify in one district and be denied in the next. A recent experimental study made the inconsistency vivid: when school psychologists evaluated the same psychoeducational vignette under different identification model conditions, the identification model did not predict the SLD decision. What did? The evaluator’s demographics and years of experience. The framework mattered less than the person applying it. This creates real casualties. Any model with low sensitivity systematically misses students with persistent, impairing skill deficits. When program access depends on crossing a threshold rather than on functional need plus intervention response, false negatives become service denials by design. One of our clinical directors described the pattern she sees repeatedly: parents pay thousands for private neuropsychological evaluations that document clear dyslexia under DSM-5 criteria, then bring those evaluations to their school district, which runs its own assessment under different IDEA-based criteria and reaches a different eligibility determination. The legal mechanisms for resolving this, IEP meetings, independent educational evaluations, due process hearings, are often inaccessible to families without resources. A clinical manager on our team raised a compliance dimension that rarely surfaces publicly. She has experienced pressure from special education teams to qualify students under Speech-Language Impairment when it looks like they won’t qualify under SLD criteria, often because the PSW pattern isn’t there. Many “speech only” students end up receiving services well beyond speech-language intervention. If a district were audited, the mismatch between eligibility category and actual service delivery creates Child Find exposure. IDEA requires districts to identify, locate, and evaluate all children with disabilities. Using the right categories, not convenient ones. The Ontario Psychological Association published guidelines in 2018 that formalize what structured clinical reasoning looks like in practice: a 10-step assessment process that balances actuarial data with explicit clinical judgment requirements. Jordan cited these guidelines as his preferred framework. What makes them distinctive is a mandatory formulation step. Clinicians must synthesize quantitative and qualitative evidence into a coherent diagnostic narrative that connects findings to individualized recommendations. Scores inform but don’t determine the diagnosis. The guidelines state directly that diagnosing a learning disability is “more complex than simply providing scores and cut-off points.” The OPA framework requires five conditions: a history of below-expected academic functioning, standardized achievement scores at least one standard deviation below the mean, evidence of underlying cognitive processing differences, at least average thinking and reasoning abilities, and exclusion of other primary causes. But it explicitly empowers clinicians to exercise judgment when scores fall in borderline ranges, acknowledging that language-heavy IQ tests often artificially suppress the scores of students whose reading and language disabilities are the very thing being assessed. The psychometrics are necessary. They’re just not sufficient on their own, especially when the consequences of misclassification are service access, accommodation eligibility, and long-term trajectories. The legitimate concern is calibration. How do you prevent clinical judgment from becoming uncalibrated variability? The OPA framework addresses this by requiring multiple converging data sources, explicit attention to limits of certainty, and documentation norms that make reasoning auditable. Increasing discretion can increase inequity unless practice is standardized through training and accountability. But the formulaic alternative hasn’t solved the consistency problem either. The experimental data showing that evaluator demographics predict decisions more than model choice suggest that rigid frameworks create an illusion of objectivity without delivering it. For those of us building technology in this space, the research points to a specific and constrained role. AI can’t resolve the construct validity problems that plague algorithmic identification. If PSW models produce inconsistent classifications, an AI system trained on PSW-labeled data will automate and scale those inconsistencies. Where technology does contribute is in the data layer that supports clinical reasoning: universal screening to flag at-risk students, progress monitoring to track intervention response over time, adaptive assessment to reduce floor and ceiling effects. AI-powered handwriting analysis and oral reading assessment are showing promise as screening signals, but they’re risk indicators, not diagnostic endpoints. The strongest opportunity sits in the assessment-to-intervention loop, feeding adaptive assessment data into clinical workflows and generating the longitudinal instructional response profiles that the hybrid identification approach now favors. A 2025 article framed SLD identification as a “wicked problem,” one that resists clean algorithmic solutions because it involves heterogeneous constructs, contested measurement, value-laden eligibility decisions, and irreducible clinical complexity. Wicked problems aren’t solved by better formulas. They’re managed through structured reasoning, continuous evidence, and the professional judgment of people who understand both the data and the child sitting in front of them. Jordan put it plainly: “Actuarial data support the argument, but shouldn’t be making the decision.” That distinction, between data that informs and algorithms that decide, is where the future of SLD identification turns.