Research Methodology

How Scores Are Calculated

From question responses to dimension scores and composite measures

The Scoring Framework

Each assessment produces a set of dimension scores (one per dimension, on a 0-to-100 scale) and a single composite score that summarises the overall result. These numbers are not subjective ratings: they are calculated mechanically from your responses using a consistent formula applied to every participant.

Scores are used for two purposes: assigning your archetype and enabling peer comparison. Understanding how they are calculated helps you interpret what a high or low score actually means.

Question Types and Their Role in Scoring

Five question types appear across the three studies. Each plays a different role in the scoring calculation.

Tradeoff Pairs

The primary scoring input. Each question presents two statements and asks which better describes your experience, on a scale from "Definitely A" to "Definitely B" with "Equal" in the middle. Internally, the responses are coded from -2 (strongly A) to +2 (strongly B).

Neither option is designed to be obviously correct. The forced choice between two valid approaches removes the "agree with everything" bias that undermines most surveys. Your genuine preference is revealed by where you land when both options have merit.

Every tradeoff pair maps to one specific dimension. The dimension score is built primarily from the mean of these paired responses.

Scenario Questions

Realistic workplace situations with three follow-up questions each. Scenarios use the same -2 to +2 response scale as tradeoffs, but they contribute to dimension scores at a reduced weight (0.3 to 0.5 of a tradeoff's weight) because they measure behavioural tendency rather than direct preference.

Each scenario question is followed by a confidence probe: "How confident are you in this answer?" The confidence rating is not included in dimension scores but is used to detect patterns across the assessment.

MaxDiff Questions

From a list of statements, you select which is most like your experience and which is least like your experience. MaxDiff produces sharper differentiation than rating scales because it forces a genuine ranking rather than allowing everything to cluster near the top.

MaxDiff responses do not feed directly into dimension scores. They are used as signals in archetype assignment, helping to differentiate between profiles that have similar dimension scores but different underlying patterns.

Likert Items

Agreement statements rated on a five-point scale from Strongly Disagree to Strongly Agree. Likert items provide absolute intensity measures that complement the relative comparisons from tradeoff pairs.

Like MaxDiff, Likert responses do not feed directly into dimension scores. They serve as modifiers and tiebreakers in archetype assignment, and they are used to detect paradoxes where stated attitudes contradict measured behaviour.

Confidence Probes

A three-point response collected after each scenario question: Easy (clear, settled orientation), Hard to decide (genuinely torn), or Neither fits (the framework does not match your situation). Six probes are collected per study.

Confidence patterns reveal where you are certain and where you face genuine ambiguity. A cluster of "Hard to decide" responses on a particular dimension signals a real tension point worth exploring.

How Dimension Scores Are Calculated

All dimension scores follow the same general pattern: start with the mean of the tradeoff pairs that map to that dimension, add weighted contributions from relevant scenario questions, then normalise the result to a 0-to-100 scale.

The Calculation in Plain English

  1. Average the tradeoffs. For each dimension, take the mean of the two or three tradeoff responses that belong to it. This gives a raw score between -2 and +2.
  2. Add scenario contributions. Relevant scenario questions contribute at a fraction of their face value (typically 0.3 of a tradeoff). This shifts the raw score slightly based on how you respond to realistic workplace situations.
  3. Normalise to 0-100. The raw score (now covering a wider range, because scenarios extended it) is rescaled so that the minimum possible raw score maps to 0 and the maximum maps to 100.

Scores are always clamped to the 0-100 range. It is not possible to score below 0 or above 100.

What the 0-100 Scale Means

Each dimension has a direction. One end of the scale (low scores, nearer to 0) represents one pattern of responses; the other end (high scores, nearer to 100) represents the opposite pattern. The specific meaning of each pole depends on the study.

A score of 50 means your responses were evenly balanced between the two poles. It is not a neutral or average score in a normative sense: it simply means you did not lean consistently toward either end.

Per-Study Scoring Details

Study 1: Will AI Replace Me? (AI Vulnerability)

Four dimensions are scored from 12 tradeoff pairs (three per dimension) plus six scenario questions. Each dimension covers a different facet of role exposure to AI disruption.

Dimension Low score means High score means
Task Exposure Production-focused work: creating content, running processes Curation-focused work: selecting, evaluating, synthesising
Skill Replaceability Routine, pattern-based skills: easier for AI to replicate Novel, synthesis-based skills: harder for AI to replicate
Adaptation Speed Individual execution focus: working alone on defined tasks Coordination focus: enabling others, bridging functions
Organisational Buffer Explicit, documentable knowledge: can be codified and automated Tacit, judgement-based knowledge: harder to replicate

Composite: Vulnerability Index. The four dimension scores are combined with different weights to produce a single Vulnerability Index from 0 to 100. The index is designed so that higher B-pole scores (curation, novelty, coordination, tacit knowledge) reduce vulnerability. Task Exposure carries the largest weight (30%), followed by Skill Replaceability and Adaptation Speed (25% each), and Organisational Buffer (20%). A high Vulnerability Index indicates greater exposure to AI disruption. A low index indicates stronger natural defences.

Study 2: Are We Adopting AI Fast Enough? (AI Adoption)

Three main dimensions are scored from 10 tradeoff pairs plus six scenario questions. A fourth dimension, Future Orientation, is derived separately and excluded from the composite.

Dimension Low score means High score means
Usage Depth AI is embedded in existing tools and processes (IT-selected) AI is self-selected and used autonomously outside standard workflows
Tool Breadth AI impact is individual: your own productivity only AI impact extends to the team: shared workflows, coordination
Integration Level AI used for predictable, checklist-driven tasks AI used adaptively for exploration, reasoning, and judgement
Future Orientation Assumes better AI tools will solve coordination problems automatically Sees structural design as essential: AI works best when workflows are deliberately built around it

Composite: Simple average. The composite for Study 2 is the straightforward average of Usage Depth, Tool Breadth, and Integration Level. Future Orientation is excluded because it is diagnostic rather than a direct measure of current adoption depth. A high composite indicates mature, broad AI integration. A low composite indicates early-stage or narrow adoption.

Study 3: What's Holding Back My Use of AI? (Structural Friction)

Three friction dimensions are scored from 10 tradeoff pairs plus six scenario questions. Study 3 uses a different scoring logic from the other two studies.

Each tradeoff pair in Study 3 pits two friction types against each other: Activation versus Knowledge, Knowledge versus Decision, or Activation versus Decision. Your response determines how much each friction type receives from that pair. Leaning strongly toward one option does not mean the other is absent: it means one is more prominent than the other in your experience.

Dimension What it measures
Activation Friction Barriers to getting started: waiting for approvals, chasing people, coordination overhead before work can begin
Knowledge Friction Gaps in accessible knowledge: scattered information, expertise trapped in specific people, documentation that does not exist
Decision Friction Constraints on decisions: reasoning that was never recorded, decisions that get revisited because stakeholders were excluded, conflicting directions

Composite: Maximum friction score. The composite for Study 3 is the highest of the three friction dimension scores, not the average. This design reflects a key insight: the dominant friction type defines the overall friction experience. If Activation Friction is 85 and the other two are 30, the composite is 85. Averaging would obscure the severity of the dominant barrier.

Score Labels

Dimension scores and composite scores receive descriptive labels to make them easier to interpret. These labels are the same across all studies for dimension scores, and study-specific for composite scores.

Dimension Score Labels

Score range Label Meaning
0-39 Low Your responses lean toward the A-pole of this dimension. The B-pole characteristics are not prominent in your profile.
40-69 Moderate Your responses are balanced or mixed for this dimension. It plays a role in your profile without defining it.
70-100 High Your responses lean strongly toward the B-pole. This dimension is a defining characteristic of your profile.

Composite Score Labels

See the Benchmarks page for composite score label thresholds and what each level means for each study.

What Scores Are Not

Scores are not judgements of quality or performance. A high Vulnerability Index does not mean you are a poor performer: it means your current role overlaps significantly with AI capabilities and adaptation is strategically important. A low AI Adoption composite does not mean you are behind: it may reflect deliberate choices about where AI adds value for your work.

No single score is a complete picture. Profiles are assigned from patterns across all four dimensions, not from the composite alone. Two people with the same composite can have very different dimension patterns, and very different archetypes as a result. The Archetypes page explains how the combination of dimension scores determines which profile you receive.

Scores are calculated from self-report. The multi-method design (tradeoffs, scenarios, MaxDiff, Likert) reduces but does not eliminate self-report bias. Results are most useful as a structured starting point for reflection, not as a definitive external measurement.