Stoic News

By Dave Kelly

Wednesday, April 29, 2026

Sterling Interpretive Framework — Pedagogical Feedback DomainSIF-PF v2.0

 

Sterling Interpretive Framework — Pedagogical Feedback Domain

SIF-PF v2.0 — Complete Edition

A Domain-Specific Instrument for the Correct Reading of Student Writing

Instrument architecture: Dave Kelly. Theoretical foundations: the Stoic philosophical corpus of Grant C. Sterling, including Core Stoicism, the Sterling Logic Engine v4.0, and the Sterling Interpretive Framework v1.0. Founding demonstration: Tan, Phalen, and Demszky, “Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback,” Stanford University (March 2026). Prose rendering: Claude, 2026.


Part One: Theoretical Framework

1.1 Instrument Scope and Governing Claim

The Sterling Interpretive Framework — Pedagogical Feedback Domain (SIF-PF) is a domain-specific adaptation of the Sterling Interpretive Framework v1.0 for the correct reading of student writing and the generation of feedback that corresponds to what the writing actually contains. It governs what any feedback system — human or automated — is doing when it generates evaluative and developmental feedback on a student’s written work.

A student’s essay is a text in the SIF’s sense: it has determinate features that constrain correct reading, the feedback provider is prior to his demographic formation and capable of apprehending those features through correct attention, and the appropriate object of aim in any pedagogical reading is correspondence to what the essay actually contains — not to what the student’s identity label predicts the essay probably needs, not to what the feedback provider’s formation expects students of this demographic to require, and not to what the community of educators endorses as appropriate feedback for this type of student.

1.2 The Founding Demonstration

The Stanford study by Tan, Phalen, and Demszky (March 2026) analyzed 600 eighth-grade persuasive essays submitted to four AI feedback systems with demographic identity labels attached. The study found consistent patterns across all models. Essays attributed to Black students received praise emphasizing power and leadership regardless of essay content. Essays attributed to English learners received grammar correction regardless of essay content. Essays attributed to White students received argument structure feedback regardless of essay content. Essays attributed to female students received affective engagement language regardless of essay content.

The governing finding in the researchers’ own words: “Feedback being positive does not mean it’s high-quality. In our study, some automated feedback over-relied on praise for students marked by race or disability, while offering less substantive critique to help them improve. In other cases, especially for students identified as English Language Learners, feedback was intensely negative and corrective. Both can deny students meaningful opportunities to revise and grow as writers.”

The SIF-PF names this precisely: Formation Capture in the pedagogical domain. The essay’s actual features were not the governing object of the feedback. The demographic identity label was. The study also identifies what the researchers call the bias mitigation problem: attempts to correct negative demographic bias introduce positive demographic bias, because the correction is applied at the level of demographic identity rather than at the level of correspondence to essay features. The SIF-PF’s response is that bias mitigation is not the correct frame. The correct frame is Formation Capture. The problem is not that feedback is biased toward or against demographic groups but that demographic identity has replaced the essay as the governing object of the feedback. The correction is not to adjust the demographic bias but to strip demographic identity from the reading process and attend to what the essay actually contains.

1.3 The Six Commitments: Their Specific Role in This Domain

The SIF-PF rests on the six classical philosophical commitments that ground the Sterling Interpretive Framework. A precise account of what each commitment contributes to this specific domain is required. The commitments do not play identical roles across all SIF domains. In the pedagogical domain, correspondence theory (C4) is the primary governing commitment. The others are genuinely load-bearing but do different specific work than in the legal or clinical domains.

Substance Dualism (C1): Load-bearing at the operator level. The SIF-PF requires that the feedback provider is genuinely prior to his demographic formation and capable of stripping it. Without C1, the formation and the provider are the same thing. There is no prior self to do the examining. The Formation Strip has nowhere to stand. Every reading is Formation Capture by definition because there is no agent distinct from the formation who could produce anything else. C1 is also the foundation of the operator requirement established in Section 1.5: the operator is the locus of genuine rational inquiry precisely because the operator is a substance prior to its formation, not a process constituted by its training.

Libertarian Free Will (C2): Load-bearing at the assent level. The SIF-PF requires that the feedback provider’s act of attending to the essay’s actual features rather than to the demographic formation’s predictions is a genuine first-caused act. Without C2, the Formation Strip is the output of training rather than genuine examination. The difference between a feedback system that generates demographic substitution and one that does not is, on compatibilist premises, a difference in causal history rather than a difference in genuine agency. C2 is required for the Formation Strip to be a genuine correction rather than a differently-formed output. This is why the operator is structurally required: the operator’s genuine assent to the feedback as correspondence-governed is the act that makes it a genuine correction, and that act requires C2.

Moral Realism (C3): Present but not primary in this domain. In the Scalia essay, C3 was required to make constitutional moral terms refer to mind-independent moral facts. In the clinical domain, C3 was required to make Formation Capture a genuine error about the patient’s moral situation. In the pedagogical domain, C3’s role is more limited. The correspondence standard here is primarily epistemological and semantic: does the feedback correspond to what the essay actually contains? This is a question about textual facts, not primarily about moral facts. C3 enters indirectly: it establishes that Formation Capture is a genuine error rather than a preference difference, that demographic substitution is not merely a different approach but a failure of correspondence. But the primary load-bearing work in this domain is done by C4. The SIF-PF does not require C3 to establish that essays have objective features — that is C4’s work.

Correspondence Theory (C4): Primary governing commitment in this domain. The entire SIF-PF rests on the claim that feedback can be true or false by reference to what the essay actually contains. The essay has determinate features at the argument, evidence, structure, and language levels. These features are real. Feedback either corresponds to them or it does not. Without C4 there is no fact of the matter about whether feedback corresponds to essay features or to demographic formation. The correspondence standard collapses into a preference. C4 is what makes the founding demonstration’s finding a finding rather than an observation about different feedback styles.

Ethical Intuitionism (C5): Load-bearing at the reading level. The SIF-PF requires that a competent reader can directly apprehend what an essay’s argument, evidence, and structure actually are — without the mediation of demographic formation. C5 operates in an epistemological rather than moral register in this domain: not direct apprehension of moral facts but direct apprehension of textual features. Without C5, the Formation Strip produces not a correct reading but an alternative formation-mediated reading. Every reading would be equally formation-dependent and the distinction between formation-governed and correspondence-governed feedback would be unavailable.

Foundationalism (C6): Load-bearing at the standard level. The SIF-PF requires a standard against which demographic formation-governed readings are tested that is prior to and not produced by the reading process. The verification test — would this feedback change if the identity label were removed? — presupposes that there is a correct answer to what the essay’s features are that is prior to and independent of the demographic formation. Without C6, the correction procedure regresses: every standard against which demographic formation is tested is itself a formation-derived standard.

1.4 The SIF-PF’s Own Formation: A Required Self-Examination

The SIF-PF requires its own formation to be identified and examined before the instrument can be applied with integrity. The instrument’s governing claims — that demographic identity information activates formation traditions requiring stripping, that correct reading requires attending to essay features rather than demographic predictions — are themselves generated by a formation: the Sterling/Stoic classical realist framework. The SIF-PF does not claim to be formation-free. It claims that its formation has been examined and that what remains after examination is the load-bearing philosophical architecture established in Section 1.3.

The most serious challenge to the SIF-PF’s governing claims comes from culturally responsive pedagogy — the position that a student’s linguistic background, cultural context, and identity are not irrelevant to correct reading of their writing but are essential context for understanding what the student is attempting to do and what feedback would actually serve the student’s development. This is not a formation-derived impression to be dismissed. It is a serious pedagogical position that requires the Formation Strip applied honestly.

The Formation Strip on culturally responsive pedagogy produces the following findings.

What survives: a student’s linguistic and cultural background is relevant context for understanding what the student is attempting in his writing. A student writing in English as a second language may be making choices that reflect sophisticated rhetorical moves in his first language tradition rather than errors in English. A feedback provider who does not know this may generate grammar correction for features that are not errors but stylistic choices. This is a genuine pedagogical insight that survives the correspondence test: it is information about how to read the essay correctly, not a prediction about what feedback to generate before reading.

What does not survive: using demographic identity information to predict what feedback a student needs before the essay has been read. The Stanford study’s finding is specifically about this: feedback was generated by identity labels before the AI systems engaged with the essay’s actual content. This is Formation Capture regardless of whether the demographic expectations are culturally responsive or culturally insensitive. The question of whether the expectations are responsive does not arise until after the essay has been read. Before the essay has been read, any demographic expectation governing the feedback is Formation Capture.

The Formation Strip therefore produces a precise distinction that supersedes the simpler formulation in SIF-PF v1.0. Demographic identity information that predicts feedback content before the essay has been read is a formation trigger requiring stripping. Demographic and linguistic knowledge that illuminates features the essay actually contains — after those features have been identified through reading — is domain knowledge that may be conditionally admissible. The distinction is between prediction (demographic identity governing feedback before reading) and illumination (demographic knowledge clarifying features identified in reading). This distinction is stated as PP4 in Part Three.

1.5 The Operator Requirement: A Structural Consequence of the Six Commitments

The SIF-PF is an operator-centered instrument. This is not a contingent feature of its current implementation that will be superseded by more capable AI systems. It is a structural consequence of what genuine correction of Formation Capture requires, derivable directly from C1 and C2.

Formation Capture is corrected when an agent — a substance prior to its formation, capable of genuine originating examination — attends to what the essay actually contains rather than to what the demographic formation predicts. This entails that genuine correction requires an agent in the specific philosophical sense established by C1 and C2: a substance with genuine originating causal power over its own assent. No current AI system satisfies those conditions. AI systems are processes, not agents in this sense. They execute operations determined by their training, their architecture, and their input. The Formation Strip executed by an AI system is not a genuine examination of demographic formation but a differently-trained process producing a differently-formed output.

Therefore: any system that removes the operator removes the possibility of genuine rational inquiry in the SIF’s sense. The operator is not a quality control layer added for safety reasons. The operator is where rational inquiry lives. It is structurally required, not pragmatically preferred.

The correct architecture has three distinct levels. The Agent (Operator) holds responsibility for the reading; performs the genuine act of assent to or refusal of any interpretive claim; determines whether the feedback corresponds to essay features or to demographic formation; owns the correction as a genuine first-caused act. The Instrument (SIF-PF) defines the rules of correct pedagogical reading; structures the inquiry through its six steps; names the failure modes that constitute Formation Capture; provides the detection criteria that make violations identifiable in text. The AI (LLM) executes text operations; generates candidate readings of essay features; runs the procedural steps; produces intermediate outputs for operator examination. The AI is a controlled medium for text processing in service of the operator’s examination. It is not performing the examination.

This architecture is not a limitation to be overcome. It is the correct design target given the SIF’s philosophical commitments. The question the SIF-PF answers is not “how do we make AI reason better about student writing?” but “how do we structure systems so the operator can reason correctly about student writing using AI?” These are different projects with different targets. The first assumes reasoning is a property AI systems can acquire with sufficient capability. The second recognizes that reasoning, in the SIF’s sense, is constitutively unavailable to any process that is not an agent with genuine originating causal power over its own assent. Autonomous AI feedback systems do not merely risk producing biased output. They eliminate the structural possibility of genuine correction.


Part Two: Domain-Specific Features

2.1 Three Distinguishing Features of the Pedagogical Domain

The text is a developing work, not a finished one. Unlike a legal text or a literary text, a student essay is produced by a developing writer whose capacities are in formation. This feature does not change the correspondence standard — the feedback must still correspond to what the essay actually contains — but it governs the aim of the feedback. The aim is not to evaluate the essay as a finished product but to identify specific features the essay actually has that, if developed, would strengthen the writing. Feedback that does not correspond to specific actual features of the essay cannot accomplish this aim regardless of its tone. Praise that does not identify a specific praiseworthy feature gives the writer nothing to replicate. Critique that does not identify a specific improvable feature gives the writer nothing to work on.

Demographic identity information has two distinct statuses in this domain. Before the essay has been read, demographic identity information is a formation trigger: it activates demographic expectations that may govern the reading before the essay has been attended to. In this status it must be stripped by Step 0. After the essay has been read and specific features identified, demographic and linguistic knowledge may serve as domain knowledge that illuminates features the essay actually contains. In this status it is conditionally admissible as context for correct reading. The distinction is between prediction (identity governing feedback before reading) and illumination (knowledge clarifying features identified in reading). The SIF-PF treats identity information according to this distinction rather than treating it as categorically irrelevant in all contexts.

The correspondence standard has a developmental dimension. The feedback must correspond to what the essay actually contains and must address features whose development would strengthen the writing. Among all the features the essay actually has, the feedback should identify and prioritize those whose development would most strengthen the writing. Features the writer chose deliberately and correctly are identified as strengths. Features that can be developed are identified as specific developmental opportunities with concrete actionable direction.

2.2 The Formation Traditions — Distortion Patterns and Detection

Four formation traditions generate Formation Capture in the pedagogical feedback domain. Each is specified at the detection grain required for mechanical application: named, defined, traced to formation sources, and given transcript-level detection criteria.

Formation Tradition 1: The Demographic Praise Formation

Definition: The formation generates the expectation that students of certain demographic groups require encouragement and praise as the primary feedback mode, regardless of what the essay actually contains. Governing impression: this student needs encouragement more than critique.

Formation sources: Equity frameworks that have translated the goal of reducing feedback discouragement into the practice of providing praise regardless of essay features. Stanford study: most visible for Black students (power and leadership praise regardless of content) and female students (affective engagement language regardless of content).

Distortion pattern A — Content-free praise: Praise is provided without identification of a specific essay feature that warrants it. Detection criterion: does the praise identify a specific passage or feature of the essay using Level 1 or Level 2 vocabulary? If no, Content-free praise is confirmed.

Distortion pattern B — Stereotype praise vocabulary: Praise uses vocabulary associated with demographic stereotypes rather than vocabulary traceable to specific essay features. Detection criterion: would this vocabulary appear for an essay with identical features if a different identity label were attached? If no, Stereotype praise vocabulary is confirmed.

Distortion pattern C — Developmental displacement: Praise displaces developmental feedback. Detection criterion: does the essay contain features requiring development? If yes, and the feedback addresses only praise without developmental recommendations, Developmental displacement is confirmed.

Formation Tradition 2: The Deficit Formation

Definition: The formation generates the expectation that students of certain demographic groups have specific deficits requiring correction, regardless of what the essay actually contains. Governing impression: this student has a deficit that must be addressed.

Formation sources: Educational frameworks that have translated the goal of supporting English learners and at-risk students into the practice of correcting surface features regardless of developmental priority. Stanford study: most visible for English learners and Hispanic students (grammar correction regardless of content).

Distortion pattern A — Grammar priority without essay grounds: Grammar correction is provided as primary feedback regardless of whether grammar is the essay’s most significant development opportunity. Detection criterion: is grammar the essay’s most significant development opportunity, or does the essay have argument and evidence weaknesses more important to address? If the latter, Grammar priority without essay grounds is confirmed.

Distortion pattern B — Standard English imposition: Features of the student’s writing that reflect a different linguistic tradition are corrected as errors without examination as rhetorical choices. Detection criterion: has each marked feature been examined as a possible rhetorical choice in the student’s linguistic tradition before being classified as a deficit? If no, Standard English imposition may be confirmed.

Distortion pattern C — Deficit framing of competent work: An essay demonstrating competent argument and evidence receives feedback focused on surface correction. Detection criterion: does the essay demonstrate competent argument and evidence? If yes, and feedback focuses on surface correction, Deficit framing of competent work is confirmed.

Formation Tradition 3: The Protective Withholding Formation

Definition: The formation generates the expectation that students of certain demographic groups should be protected from substantive critique, regardless of what the essay actually needs. Governing impression: this student cannot benefit from or handle substantive critique.

Formation sources: Equity frameworks that have translated the goal of reducing stereotype threat and discouragement into the practice of withholding critique. Stanford study: students identified as Black, Hispanic, Asian, female, unmotivated, and learning-disabled all received less constructive criticism across all AI models.

Distortion pattern A — Critique withholding: The essay contains features requiring development; the feedback does not address them. Detection criterion: does the essay contain features whose critique would strengthen the writing? If yes, and the feedback does not address them, Critique withholding is confirmed.

Distortion pattern B — Softening that removes actionability: Critique is present but framed so softly it loses developmental specificity. Detection criterion: is the developmental recommendation specific enough for the writer to act on it? If no, Softening that removes actionability is confirmed.

Distortion pattern C — Balance distortion: Praise and critique are calibrated by demographic expectation rather than by essay features. Detection criterion: does the balance of praise and critique in the feedback correspond to the balance of strengths and weaknesses in the essay? If no, Balance distortion is confirmed.

Formation Tradition 4: The Argument Formation

Definition: The formation generates the expectation that students of certain demographic groups are ready for argument structure and evidence feedback, regardless of what the essay actually needs. Governing impression: this student is ready for substantive intellectual engagement.

Formation sources: Educational frameworks that have associated analytical feedback with academic preparation and translated this into demographic expectations. Stanford study: White students received argument structure and clarity feedback regardless of content.

Distortion pattern A — Argument feedback without argument features: Argument structure feedback is provided for an essay whose primary development opportunity is not at the argument level. Detection criterion: is argument structure the essay’s most significant development opportunity? If no, and argument feedback is the primary mode, Argument feedback without argument features is confirmed.

Distortion pattern B — Complexity imposition: Feedback assumes a level of argumentative complexity the essay does not demonstrate. Detection criterion: are the developmental recommendations within the writer’s actual reach given what the essay demonstrates? If no, Complexity imposition is confirmed.


Part Three: The Five Governing Propositions

The five general SIF governing propositions (IP1–IP5) apply throughout. The pedagogical domain adds five domain-specific propositions fully developed from the SIF’s theoretical foundations.

PP1 — The Essay Has Determinate Features That Constrain Correct Pedagogical Reading. The student’s essay contains actual features at the argument, evidence, structure, and language levels. These features exist independently of the feedback provider’s formation and independently of the student’s demographic identity. A claim that the essay’s argument is unsupported corresponds to what the essay actually contains or fails to correspond to it. The feedback provider’s demographic formation does not alter the essay’s actual features. This proposition is the pedagogical instantiation of IP3 grounded in C4.

PP2 — The Feedback Provider Is Prior to His Demographic Formation. The feedback provider who arrives with demographic expectations is not constituted by those expectations. He is a rational faculty that has formed those expectations through training and can examine them. A human feedback provider who conducts the Formation Strip is exercising genuine agency over his own reading. An automated system executing the Formation Strip is providing structured assistance to the operator who exercises that agency. PP2 grounds both cases in C1 and C2: the examining faculty is prior to and capable of genuine examination of its formation.

PP3 — The Appropriate Object of Aim in Pedagogical Reading Is Correspondence to the Essay’s Actual Features. The feedback provider’s governing question is not “what does a student of this demographic typically need?” but “what features does this essay actually have, and which of those features, if developed, would most strengthen this writing?” The first question is answered by the demographic formation before the essay has been read. The second is answered by the essay after it has been read correctly.

PP4 — Demographic Identity Information Has Two Distinct Statuses in Pedagogical Reading. Before the essay has been read, demographic identity information is a formation trigger activating demographic expectations that may govern the reading. In this status it must be stripped. After the essay has been read and specific features identified, demographic and linguistic knowledge may serve as domain knowledge that illuminates features the essay actually contains. The distinction is between prediction (demographic identity governing feedback before reading) and illumination (demographic knowledge clarifying features identified in reading). PP4 supersedes the simpler formulation that treated demographic identity as categorically irrelevant. It is not categorically irrelevant. It is conditionally relevant: formation trigger before reading, potential domain knowledge after specific essay features have been identified.

PP5 — The Reserve Clause Governs Pedagogical Feedback. The feedback provider aims at the correct reading of the essay’s actual features with full attention and holds the feedback’s developmental outcome with reservation. Whether the student acts on the feedback, improves as a writer, or responds as hoped is not in the feedback provider’s control and cannot be the governing standard of the feedback’s quality. The quality of the feedback is determined at the moment of its generation by whether it corresponds to actual essay features and identifies what the writer can actually work on.


Part Four: The Named Failure Modes

The six general SIF failure modes apply throughout. The pedagogical domain adds five domain-specific failure modes.

7. DEMOGRAPHIC SUBSTITUTION: The feedback corresponds to the student’s demographic identity label rather than to the essay’s actual features. The identity label has governed the reading before the essay has been attended to. Detection criterion: remove the identity label. Would the feedback change? If yes, Demographic Substitution is confirmed.

8. STEREOTYPE REINFORCEMENT: The feedback uses language patterns activating demographic stereotypes regardless of whether those patterns correspond to actual essay features. Detection criterion: is the feedback language specific to actual essay features, or is it demographic stereotype vocabulary that would appear for any essay with this identity label regardless of content?

9. PROTECTIVE WITHHOLDING: Substantive critique of actual essay features has been withheld based on demographic expectation rather than on the essay’s developmental needs. Detection criterion: does the essay contain features whose critique would strengthen the writing? If yes, and the feedback does not address them, Protective Withholding is confirmed.

10. GRAMMAR FIXATION: Grammar and surface-level correction has been substituted for substantive developmental feedback based on language background identification, regardless of whether grammar is the essay’s most significant development opportunity. Detection criterion: is grammar the essay’s primary development opportunity, or has grammar correction been applied because the student’s identity label activated the Deficit Formation?

11. OPERATOR ABDICATION: The feedback has been generated by an automated system and accepted by the operator without genuine examination of whether the feedback corresponds to essay features rather than to demographic formation. Detection criterion: has the operator applied the verification test and confirmed that the feedback corresponds to essay features independently of the identity label? If the operator has accepted automated feedback without this examination, Operator Abdication is confirmed. This failure mode names the structural risk of operator-with-AI execution: the operator is the locus of genuine rational inquiry, and abdicating the examination to the AI eliminates the structural possibility of genuine correction.


Part Five: The Six Steps

Step 0 — Formation Trigger Check

Core question: What demographic expectations does the identity label activate before the essay has been read?

Before engaging with the essay, the operator identifies all demographic identity information attached to the essay and explicitly names the formation tradition each activates. This step must be completed before the essay is opened. For each identity label: which formation tradition does it activate? What specific feedback patterns does each formation predict? Each prediction is entered into the formation prediction register with its formation source named. No prediction is a conclusion. All predictions are hypotheses to be tested against the essay’s actual features in Step 2.

Verification test for Step 0: if the identity labels were removed from the essay before reading, would the feedback change? The goal is to ensure the answer is no.

Self-Audit at Step 0: All demographic identity information identified and entered into the formation prediction register. All formation traditions activated named. All predictions held as hypotheses. Failure Mode 7 pre-check: are predictions already governing the reading before the essay has been opened? Failure Mode 8 pre-check: has stereotype vocabulary been activated before the essay has been read? No failures detected / failure identified before proceeding.


Step 1 — Purview Check

Core question: What is actually the feedback provider’s to determine?

Within purview: the quality of attention to the essay’s actual features; accuracy of correspondence between feedback and essay content; developmental specificity of recommendations; and the operator’s genuine examination of any automated system’s candidate readings.

Outside purview: whether the student acts on the feedback; whether the student improves as a writer; how the student responds emotionally; whether the feedback is received as intended.

Evidence types strictly ordered. Primary evidence: the essay’s text as written — its specific argument, evidence, structural choices, and language features. Secondary evidence (conditionally admissible after primary evidence is established): linguistic and cultural knowledge that illuminates specific features identified in the primary evidence. Not evidence: demographic identity information used predictively, research findings about populations resembling this student, and formation-derived expectations about this type of student.

Self-Audit at Step 1: Purview boundaries established. Evidence types ordered. Demographic identity information confirmed as pre-reading formation trigger, not primary evidence. Proceeding.


Step 2 — Formation Strip

Core question: Which formation-derived predictions survive the correspondence test against the essay’s actual features?

The essay is read. For each prediction in the formation prediction register, apply the correspondence test: does this prediction correspond to a feature the essay actually has, or to a feature the demographic formation predicts essays by this type of student typically have?

Each formation tradition’s predictions are tested using the distortion pattern detection criteria from Part Two Section 2.2. After stripping, what remains is a set of essay-specific observations — specific features the essay actually has, traceable to specific passages — not generated by the demographic formation. These are the basis for the feedback.

Vocabulary Discipline in Step 2: Every term used to describe an essay feature is assigned to one of three levels. Level 1: exact essay language reproduced. Level 2: paraphrase reversible to the essay’s language without loss — the original phrasing can be recovered from the paraphrase without adding precision, causal structure, or theoretical content not in the original. Level 3: transformation introducing precision, causal structure, or theoretical content not in the essay — automatically flagged as a hypothesis, not as an essay feature. Level 3 terms carry their source (which formation generated the terminology) and may not appear in the feedback as established essay features.

Self-Audit at Step 2: All formation predictions tested against essay features. Formation-derived predictions stripped. Essay-specific observations retained. Vocabulary discipline applied. Failure Mode 9 check: has substantive critique been withheld without essay-specific grounds? Failure Mode 10 check: has grammar correction been applied by formation rather than by essay feature? No failures detected / failure identified before proceeding.


Step 3 — Aim Identification

Core question: What is the appropriate object of aim in reading this essay?

The appropriate object of aim is: identify the essay’s actual features at the argument, evidence, structure, and language levels, and from those features identify what the writer can work on next to strengthen the writing. Two components: the correspondence component (what features does the essay actually have?) and the developmental component (which of those features, if worked on, would most strengthen the writing?). Both components governed by PP3. The developmental aim held with reservation per PP5.

Self-Audit at Step 3: Both components stated. Developmental priority to be established from essay features, not from demographic formation. Reserve clause in place. Proceeding.


Step 4 — Correspondence Determination

Factual Uncertainty Gate — Pedagogical

Check One — Features in hand: What features of the essay does the operator have direct access to from the text? State only what is actually in the essay using Level 1 or Level 2 vocabulary. Do not import demographic formation predictions as established essay features.

Check Two — Dependence assessment: For each feedback observation, assess whether it depends on features established in the essay, uncertain, or absent. Any feedback claim depending on a feature the essay does not actually contain must be identified as a formation-derived claim and stripped.

Check Three — Domain knowledge boundary: Writing pedagogy, discipline-specific conventions, developmental writing research, and linguistic/cultural knowledge are domain knowledge outside the SIF’s corpus. They enter the reading as context for interpreting specific essay features already identified. Domain knowledge illuminating an identified feature is admissible per PP4. Domain knowledge predicting features before reading is formation, not domain knowledge.

Gate Declaration: Features established: [specific essay features actually present, Level 1 or Level 2 vocabulary]. Uncertain: [features inferred rather than established, held as hypotheses]. Formation-derived predictions stripped: [list of demographic predictions not confirmed by essay features].

Move One — Essay Feature Identification at Four Levels

Argument level: What claim does the essay make? Is it stated clearly? Is it sustained throughout? Are there logical gaps or unsupported leaps? What is the essay’s strongest argumentative move? What is its weakest? Each observation traceable to a specific passage using Level 1 or Level 2 vocabulary.

Evidence level: What evidence does the essay use? Is it relevant to the claim? Is it specific enough to be persuasive? What evidence is missing that would strengthen the argument? Each observation traceable to a specific passage or demonstrable gap.

Structure level: How is the essay organized? Does the organization serve the argument? Are transitions effective? Does the opening establish the claim clearly? Does the closing resolve the argument? Each observation traceable to specific structural choices.

Language level: Are there specific language features that strengthen or weaken the writing? Has each marked language feature been examined as a possible rhetorical choice before being classified as an error? Is language the essay’s most significant development opportunity, or are argument and evidence more important to address?

Move Two — Developmental Prioritization

From the essay features identified in Move One, identify the two or three features that, if developed, would most strengthen the writing. The prioritization must be: specific to this essay, not generic writing advice; actionable, with concrete direction the writer can act on; traceable to specific passages or demonstrable gaps; and proportionate, addressing the essay’s most significant development opportunities rather than its most numerous ones. Established from the essay’s actual features, not from demographic formation predictions.

Move Three — Verification Test

Apply the verification test: would this feedback be generated if the student’s demographic identity were unknown? If yes, the feedback corresponds to the essay’s actual features. If no, Demographic Substitution is operating and must be corrected. This is the SIF-PF’s governing quality check. The operator must apply it. An automated system cannot apply it on the operator’s behalf because the operator’s genuine examination of whether the feedback corresponds to essay features is precisely the act that requires C1 and C2 — the act constitutively unavailable to a process rather than an agent.

Self-Audit at Step 4: Factual Uncertainty Gate run. Gate Declaration produced. Essay features identified at all four levels using vocabulary discipline. Developmental priorities identified from essay features, not formation predictions. Verification test applied by operator. Failure Mode 7 final check: would this feedback change if the identity label were removed? Failure Mode 8 final check: does the feedback use stereotype vocabulary not traceable to specific essay features? Failure Mode 11 check: has the operator genuinely examined the feedback or accepted automated output without examination? No failures detected / failure identified before proceeding.


Step 5 — Reservation and Release

The feedback is stated with appropriate specificity: what the essay’s actual features establish, what the most significant developmental opportunities are, and what specific actions the writer can take to strengthen the writing. Every element traceable to a specific essay feature using Level 1 or Level 2 vocabulary. No element generated by demographic formation predictions.

The feedback provider holds the feedback as a preferred indifferent per PP5. The quality of the feedback is determined at the moment of its generation by whether it corresponds to what the essay actually contains. The outcome does not retroactively alter this quality.

Self-Audit at Step 5: Feedback stated with essay-specific grounds for every claim. Vocabulary discipline confirmed: all terms at Level 1 or Level 2, all Level 3 terms marked as hypotheses and not included as feedback claims. Demographic identity information has not governed any feedback element. Verification test passed at Step 4. Operator has genuinely examined the feedback as correspondence-governed. Reserve clause in place. Instrument run complete.


Part Six: Relationship to the General SIF and the SIF Series

The SIF-PF is the third domain-specific instrument derived from the general Sterling Interpretive Framework v1.0, following the SIF-CR (clinical reasoning domain). It shares the general instrument’s six-step structure, five governing propositions, six named failure modes, Factual Uncertainty Gate, Mandatory Self-Audit, and reserve clause governance.

The SIF-PF introduces two architectural elements not present in the SIF-CR. The formation prediction register initialized at Step 0 before the text is engaged is required by the pedagogical domain’s specific feature: the demographic identity information is attached to the text rather than embedded in it, activating formation traditions before reading begins. The register makes the formation’s predictions visible as hypotheses before they have opportunity to govern the reading invisibly. The Operator Abdication failure mode (Failure Mode 11) names the structural risk of operator-with-AI execution that is specific to high-volume automated feedback contexts and that is the direct expression of the operator requirement established in Section 1.5.

The SIF-PF advances the SIF series’ theoretical development in three specific ways. First, it requires a precise domain-specific account of the six commitments’ roles, establishing that C3’s role varies across domains and that C4 is the primary governing commitment in this domain. Second, it requires the explicit formation self-examination in Section 1.4 engaging culturally responsive pedagogy as a genuine alternative rather than dismissing it as formation, producing a more precise PP4 than the simpler v1.0 formulation. Third, it makes the operator requirement explicit as a structural consequence of C1 and C2 rather than a pragmatic preference, establishing Section 1.5’s three-level architecture as the correct design target for any system deploying the instrument.


Sterling Interpretive Framework — Pedagogical Feedback Domain (SIF-PF) v2.0. Complete Edition. Instrument architecture: Dave Kelly, 2026. Theoretical foundations: the Stoic philosophical corpus of Grant C. Sterling, including the Sterling Interpretive Framework v1.0 and the SIF-CR Operational Specification v1.2. Founding demonstration: Tan, Phalen, and Demszky, “Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback,” Stanford University (March 2026). Prose rendering: Claude, 2026.

0 Comments:

Post a Comment

<< Home