Journal Summary
Generated by ChatGPT
The focus shifted toward orchestration design, deterministic post-processing, and tool integration. While experimenting with CrewAI tools and push notifications, the user discovered an important operational reality: LLMs cannot be relied upon to invoke tools consistently. In response, the architecture was redesigned so that agents generated structured JSON outputs while deterministic Python logic controlled all operational side effects, including push notifications. This represented a key architectural refinement separating semantic interpretation from transactional execution and reinforced the broader principle that deterministic systems should retain operational authority wherever possible.
The remainder of the day focused on expanding the deterministic Date/Window Validation capability. The user identified several real-world healthcare timing nuances missing from the original model, including grace periods, retroactive authorization eligibility, inactive authorization statuses, and operational aging logic. Synthetic datasets were expanded with new fields supporting retro-auth and timing evaluation, and the user designed a deterministic validation flow capable of classifying claims as valid, retro-eligible, expired, or invalid. After evaluating whether this logic belonged inside CrewAI, the user concluded that date validation should remain outside the LLM framework entirely because the rules were explicit, auditable, and computationally deterministic.
Starting Point
Today I will start building tools that the Exception Explanation Agent can call. To start, I will implement a tool that simply prints a message indicating that it was called.
I also learned that CrewAI has several tools available for use, but I’m actually interested in tools that offer some level of customization. I am experimenting with push notifications and am able to send notifications from the summarizer agent using pushover.net. However, I learned that the LLM may ignore specific instructions passed to it or decide they’re unnecessary!
In this particular case, I’d like to only send push notifications when there is a missing authorization. So, what’s the recommended path?
Because the LLM may or may not decide to call a tool, the most reliable pattern is:
- Task produces a machine-checkable flag/list(e.g. missing_authorization_claim_ids)
- Python code sends the push notification only if that list is non-empty
I already have an easy place to do this deterministically: (main.py)
try:
result = ExceptionExplanationAgent().crew().kickoff(inputs=inputs)
print(result.raw)
except Exception as e:
raise Exception(f"An error occurred while running the crew: {e}")
And I already have the tool class available: (push_tool.py)
class PushNotificationTool(BaseTool):
# ...
def _run(self, message: str) -> str:
# ...
requests.post(pushover_url, data=payload)
return f'{"notification": "ok"}'
I’ve adjusted the summarizer task output to reflect JSON and not markdown, and have separated the missing authorization claim IDs for later reference. In fact, I’ve added specific guidance to output a correctly formatted JSON file:
expected_output: >
Output MUST be a single, valid JSON object and nothing else.
Do NOT include markdown or code fences of any kind (no ```json, no ```, no '''json, no '''), and do NOT add any leading/trailing commentary.
Required JSON keys:
narratives_markdown: string
missing_authorization_claim_ids: array of strings
Example shape (do not copy values, do not use code fences): { "narratives_markdown": "...", "missing_authorization_claim_ids": ["CLM-..."] }
Now, I’ll parse the new JSON file to call the PushNotificationTool when appropriate. This will guarantee that notifications are sent only when there is a missing auth, regardless of model behavior.
At present, push notifications are working and are able to distinguish among more than 1 missing authorization (I’ve added a 2nd missing authorization record to match-results.json). Now I’ll output the same information for date mismatches, and then I can start building the Date / Window Validation Agent which is also deterministic.
As a refresher, what is the Date/Window Validation Agent? It is one of several agents that apply to the DATE mismatch event:
Date / Window Validation Agent
- Purpose: Validates service date against auth effective/expiration dates, grace periods, and retro-auth eligibility.
- Inputs: claim service date, auth effective/expiration dates
- Category: Date Mismatch
- Complexity: Medium
- Type: Deterministic
- Notes: Easy to explain; rules are intuitive and auditable.
In reviewing our original match/mismatch data, I have data fields that only tell me whether the service date fell between the authorization start and end dates - i.e., a date matches.
| Field | Supports |
|---|---|
service_date | Date being tested |
auth_start | Beginning of authorized window |
auth_end | End of authorized window |
But there are several additional timing nuances that I must consider:
| Classification | Logic |
|---|---|
| Valid In-Window; I already have this defined | service_date is between auth_start and auth_end |
| Valid Within Grace Period | Service date is slightly before/after auth window but inside configured grace period |
| Potential Retro-Auth Eligible | Service occurred before approval/start, but retro-auth is allowed and within lookback |
| Expired Authorization | Service date is after auth end and outside grace period |
| Premature Service | Service date is before auth start and not retro/grace eligible |
| Invalid Due to Auth Status | Auth exists, but is denied, voided, cancelled, pending, or inactive |
| Needs Manual Review | Required rule fields are missing or ambiguous |
For Phase 1, I’ll expand the original data set to include these additional fields so that I can determine whether the mismatch is allowable, correctable, retro-eligible, or truly invalid. That will then inform what actions can be taken.
Authorizations (only includes facts about the authorization)
- auth_status: Approved, denied, pending, cancelled, expired, voided
- auth_type: Standard vs. retro vs. urgent
- auth_created date: Date the authorization was created.
- auth_approval_date: Date the authorization was approved.
Claims
- claim_received_date: Useful for timeliness and operational aging logic
Member Eligibility
- payer_id: Payer ID
Drug Reference (using this as a proxy rule engine)
- retro_auth_allowed_flag: Indicates whether retro authorization is permitted
- retro_auth_lookback_days: How far back retro-auth can apply
- grace_period_before_days: Allows service before auth start, if applicable
- grace_period_after_days: Allows service after auth end, if applicable
I asked ChatGPT to create expanded datasets and a new deterministic matching application. But before I go any further, it’s time to populate the agentic solution template for this data/window matching agent. We don’t have to go overboard here, but it’s important that we implement a few select use cases.
But first, the question is - do we want the crew to perform the date/window validation, or do we want to perform this step outside of the CrewAI framework, and keep it deterministic?
After some additional thought, there is no benefit to putting this deterministic post-processing step into agents.yaml/tasks.yaml since the LLM isn’t needed here to decide anything. Thus, I’ll continue using CrewAI as a producer of the output/summarization.json file, and then run the validation in Python.
Given this, there are still a few options in terms of creating this new function:
Option A: CrewAI supports lifecycle hooks/callbacks which can be clean if you want the “after summarization” logic to live next to the crew definition—but it’s optional.
Option B: Create a standard Python function, since this isn’t for LLM tool use. Thus, our function won’t be a BaseTool at all. Instead, I’ll just write a plain Python function/class (e.g. validate_date_mismatches(summary: dict) -> list[str]) and call it from main.py.
Ok, I’ll create a standard Python function to handle this logic and I need to identify the specific date checks that I need to perform, starting with pre-processing - I’ll keep this simple for now:
Preprocessing
- effective_auth_start = auth_start - grace_period_before_days
- effective_auth_end = auth_end + grace_period_after_days
Checks
- Is the authorization within the allowable window?
- YES -> (service_date >= effective_auth_start && (service_date <= effective_auth_end)
- Otherwise Ineligible
- Is the authorization retro-eligible? (i.e., Service occurred before approval/start, but retro-auth is allowed and within lookback)
- YES -> (service_date < effective_auth_start) && (retro_auth_allowed = TRUE) && (days between service_date and effective_auth_start date is equal to or less than retro_auth_lookback_days)
- Otherwise -> Ineligible
This logic appears straightforward, but I’ll need access to the original datasets, not just the claim IDs. But I can use the claim IDs to select relevant data to perform these additional date checks. In fact, I need to pull these dates in real time to perform these checks since they are all associated with that mismatched claim.
After some additional research, I learned that it is best to separate the checks:
within_standard_window =
service_date >= effective_auth_start
and service_date <= effective_auth_end
retro_eligible =
service_date < auth_start
and retro_auth_allowed_flag == True
and days_between(service_date, auth_start) <= retro_auth_lookback_days
Now, examining the validation sequence, here are the steps I need to perform so that I can exit this function with a clean breakdown for each date mismatch:
- Pull the claim row by
claim_id. - Find candidate authorization(s) using member/drug/provider/date-adjacent logic.
- Join in rule fields from
drug_reference. - Calculate effective start/end.
- Run standard window check.
- If standard window fails because service date is before auth start, run retro-auth eligibility check.
- Return a structured outcome:
valid,date_mismatch,retro_eligible,retro_ineligible,after_auth_end,missing_required_date, etc.
The function needs access to the original datasets or to a joined record that contains all required fields. The claim ID is just the pointer; it is not sufficient on its own. I asked Cursor to create an expanded dataset specifically for the data mismatches. Now I can pull in this dataset to perform the additional date checks.
Here are the steps:
- Call the DataValidationTool with the mismatched claim ID.
- Load the consolidated dataset that contains the claim ID.
- Perform the two checks (standard window and retro eligible; these can always be expanded upon later)
- Calculate effective start/end.
- Run standard window check.
- If standard window fails because service date is before auth start, run retro-auth eligibility check.
- Classify/print each claim ID as:
- Valid/invalid within standard window
- Retro eligible/ineligible
I asked Cursor to create a scaffolding using this logic, but then decided to experiment using Colab and the consolidated dataset. I was able to easily calculate the standard window check, but realized that I’m missing a few key fields required for retro-authorization eligibility.
Tomorrow, I’ll revisit this new Colab file to get this logic working before I transfer it back into the main project. I may decide to postpone the missing authorization use case, and instead, proceed to a true reasoning agent which pertains to policy exceptions. This will require additional research.