Build Journal (Deterministic Validation Design)

Journal Summary

Generated by ChatGPT

The focus shifted toward orchestration design, deterministic post-processing, and tool integration. While experimenting with CrewAI tools and push notifications, the user discovered an important operational reality: LLMs cannot be relied upon to invoke tools consistently. In response, the architecture was redesigned so that agents generated structured JSON outputs while deterministic Python logic controlled all operational side effects, including push notifications. This represented a key architectural refinement separating semantic interpretation from transactional execution and reinforced the broader principle that deterministic systems should retain operational authority wherever possible.

The remainder of the day focused on expanding the deterministic Date/Window Validation capability. The user identified several real-world healthcare timing nuances missing from the original model, including grace periods, retroactive authorization eligibility, inactive authorization statuses, and operational aging logic. Synthetic datasets were expanded with new fields supporting retro-auth and timing evaluation, and the user designed a deterministic validation flow capable of classifying claims as valid, retro-eligible, expired, or invalid. After evaluating whether this logic belonged inside CrewAI, the user concluded that date validation should remain outside the LLM framework entirely because the rules were explicit, auditable, and computationally deterministic.

Starting Point

Today I will start building tools that the Exception Explanation Agent can call. To start, I will implement a tool that simply prints a message indicating that it was called.

I also learned that CrewAI has several tools available for use, but I’m actually interested in tools that offer some level of customization. I am experimenting with push notifications and am able to send notifications from the summarizer agent using pushover.net. However, I learned that the LLM may ignore specific instructions passed to it or decide they’re unnecessary!

In this particular case, I’d like to only send push notifications when there is a missing authorization. So, what’s the recommended path?

Because the LLM may or may not decide to call a tool, the most reliable pattern is:

Task produces a machine-checkable flag/list(e.g. missing_authorization_claim_ids)
Python code sends the push notification only if that list is non-empty

I already have an easy place to do this deterministically: (main.py)

try:
    result = ExceptionExplanationAgent().crew().kickoff(inputs=inputs)
    print(result.raw)
except Exception as e:
    raise Exception(f"An error occurred while running the crew: {e}")

And I already have the tool class available: (push_tool.py)

class PushNotificationTool(BaseTool):
    # ...
    def _run(self, message: str) -> str:
       # ...
        requests.post(pushover_url, data=payload)
        return f'{"notification": "ok"}'

I’ve adjusted the summarizer task output to reflect JSON and not markdown, and have separated the missing authorization claim IDs for later reference. In fact, I’ve added specific guidance to output a correctly formatted JSON file:

  expected_output: >
  
    Output MUST be a single, valid JSON object and nothing else.
    Do NOT include markdown or code fences of any kind (no ```json, no ```, no '''json, no '''), and do NOT add any leading/trailing commentary.
  
    Required JSON keys:
    narratives_markdown: string
    missing_authorization_claim_ids: array of strings
    
Example shape (do not copy values, do not use code fences):    { "narratives_markdown": "...", "missing_authorization_claim_ids": ["CLM-..."] }

Now, I’ll parse the new JSON file to call the PushNotificationTool when appropriate. This will guarantee that notifications are sent only when there is a missing auth, regardless of model behavior.

At present, push notifications are working and are able to distinguish among more than 1 missing authorization (I’ve added a 2nd missing authorization record to match-results.json). Now I’ll output the same information for date mismatches, and then I can start building the Date / Window Validation Agent which is also deterministic.

As a refresher, what is the Date/Window Validation Agent? It is one of several agents that apply to the DATE mismatch event:

Date / Window Validation Agent

Purpose: Validates service date against auth effective/expiration dates, grace periods, and retro-auth eligibility.
Inputs: claim service date, auth effective/expiration dates
Category: Date Mismatch
Complexity: Medium
Type: Deterministic
Notes: Easy to explain; rules are intuitive and auditable.

In reviewing our original match/mismatch data, I have data fields that only tell me whether the service date fell between the authorization start and end dates - i.e., a date matches.

Field	Supports
`service_date`	Date being tested
`auth_start`	Beginning of authorized window
`auth_end`	End of authorized window

But there are several additional timing nuances that I must consider:

Classification	Logic
Valid In-Window; I already have this defined	`service_date` is between `auth_start` and `auth_end`
Valid Within Grace Period	Service date is slightly before/after auth window but inside configured grace period
Potential Retro-Auth Eligible	Service occurred before approval/start, but retro-auth is allowed and within lookback
Expired Authorization	Service date is after auth end and outside grace period
Premature Service	Service date is before auth start and not retro/grace eligible
Invalid Due to Auth Status	Auth exists, but is denied, voided, cancelled, pending, or inactive
Needs Manual Review	Required rule fields are missing or ambiguous

For Phase 1, I’ll expand the original data set to include these additional fields so that I can determine whether the mismatch is allowable, correctable, retro-eligible, or truly invalid. That will then inform what actions can be taken.

Authorizations (only includes facts about the authorization)

auth_status: Approved, denied, pending, cancelled, expired, voided
auth_type: Standard vs. retro vs. urgent
auth_created date: Date the authorization was created.
auth_approval_date: Date the authorization was approved.

Claims

claim_received_date: Useful for timeliness and operational aging logic

Member Eligibility

payer_id: Payer ID

Drug Reference (using this as a proxy rule engine)

retro_auth_allowed_flag: Indicates whether retro authorization is permitted
retro_auth_lookback_days: How far back retro-auth can apply
grace_period_before_days: Allows service before auth start, if applicable
grace_period_after_days: Allows service after auth end, if applicable

I asked ChatGPT to create expanded datasets and a new deterministic matching application. But before I go any further, it’s time to populate the agentic solution template for this data/window matching agent. We don’t have to go overboard here, but it’s important that we implement a few select use cases.

But first, the question is - do we want the crew to perform the date/window validation, or do we want to perform this step outside of the CrewAI framework, and keep it deterministic?

After some additional thought, there is no benefit to putting this deterministic post-processing step into agents.yaml/tasks.yaml since the LLM isn’t needed here to decide anything. Thus, I’ll continue using CrewAI as a producer of the output/summarization.json file, and then run the validation in Python.

Given this, there are still a few options in terms of creating this new function:

Option A: CrewAI supports lifecycle hooks/callbacks which can be clean if you want the “after summarization” logic to live next to the crew definition—but it’s optional.

Option B: Create a standard Python function, since this isn’t for LLM tool use. Thus, our function won’t be a BaseTool at all. Instead, I’ll just write a plain Python function/class (e.g. validate_date_mismatches(summary: dict) -> list[str]) and call it from main.py.

Ok, I’ll create a standard Python function to handle this logic and I need to identify the specific date checks that I need to perform, starting with pre-processing - I’ll keep this simple for now:

Preprocessing

effective_auth_start = auth_start - grace_period_before_days
effective_auth_end = auth_end + grace_period_after_days

Checks

Is the authorization within the allowable window?
- YES -> (service_date >= effective_auth_start && (service_date <= effective_auth_end)
- Otherwise Ineligible
Is the authorization retro-eligible? (i.e., Service occurred before approval/start, but retro-auth is allowed and within lookback)
- YES -> (service_date < effective_auth_start) && (retro_auth_allowed = TRUE) && (days between service_date and effective_auth_start date is equal to or less than retro_auth_lookback_days)
- Otherwise -> Ineligible

This logic appears straightforward, but I’ll need access to the original datasets, not just the claim IDs. But I can use the claim IDs to select relevant data to perform these additional date checks. In fact, I need to pull these dates in real time to perform these checks since they are all associated with that mismatched claim.

After some additional research, I learned that it is best to separate the checks:

within_standard_window =
    service_date >= effective_auth_start
    and service_date <= effective_auth_end

retro_eligible =
    service_date < auth_start
    and retro_auth_allowed_flag == True
    and days_between(service_date, auth_start) <= retro_auth_lookback_days

Now, examining the validation sequence, here are the steps I need to perform so that I can exit this function with a clean breakdown for each date mismatch:

Pull the claim row by claim_id.
Find candidate authorization(s) using member/drug/provider/date-adjacent logic.
Join in rule fields from drug_reference.
Calculate effective start/end.
Run standard window check.
If standard window fails because service date is before auth start, run retro-auth eligibility check.
Return a structured outcome: valid, date_mismatch, retro_eligible, retro_ineligible, after_auth_end, missing_required_date, etc.

The function needs access to the original datasets or to a joined record that contains all required fields. The claim ID is just the pointer; it is not sufficient on its own. I asked Cursor to create an expanded dataset specifically for the data mismatches. Now I can pull in this dataset to perform the additional date checks.

Here are the steps:

Call the DataValidationTool with the mismatched claim ID.
Load the consolidated dataset that contains the claim ID.
Perform the two checks (standard window and retro eligible; these can always be expanded upon later)
- Calculate effective start/end.
- Run standard window check.
- If standard window fails because service date is before auth start, run retro-auth eligibility check.
Classify/print each claim ID as:
- Valid/invalid within standard window
- Retro eligible/ineligible

I asked Cursor to create a scaffolding using this logic, but then decided to experiment using Colab and the consolidated dataset. I was able to easily calculate the standard window check, but realized that I’m missing a few key fields required for retro-authorization eligibility.

Tomorrow, I’ll revisit this new Colab file to get this logic working before I transfer it back into the main project. I may decide to postpone the missing authorization use case, and instead, proceed to a true reasoning agent which pertains to policy exceptions. This will require additional research.