Build Journal (Tiered Search & Reasoning)

Journal Summary

Generated by ChatGPT

The Authorization Discovery Agent evolved from a conceptual investigative idea into a structured search and ranking framework. The user reframed the problem away from exact matching and toward “diagnostic proximity,” recognizing that the deterministic engine had already proven no direct authorization existed. To investigate plausible operational explanations, the user designed a tiered search methodology using progressively relaxed constraints — such as widening date windows, relaxing drug matching, introducing alternate member IDs, and examining provider-based clues. Each tier was assigned a confidence score, effectively encoding healthcare domain expertise into a deterministic candidate-ranking system inspired by Progressive Constraint Relaxation (PCR).

The user then operationalized this framework using expanded synthetic datasets and exploratory Colab workflows. Several claims lacking authorizations were manually evaluated across multiple search tiers, producing investigative summaries that surfaced alternate member identities, plausible candidate authorizations, and operational gaps. By the end of the day, the project had evolved into a layered hybrid architecture combining deterministic reconciliation, heuristic candidate discovery, structured scoring, and LLM-assisted reasoning. The user concluded that the next step would be modularizing each search tier into reusable functions and integrating OpenAI-generated investigative guidance programmatically.

Starting Point

As a refresher, it is important to remember that we’re not looking for an exact match anymore - that was already made known from our original deterministic matching engine (which identified the fact that there was at least one claim with a missing authorization). Instead, we’re looking for diagnostic proximity to help accelerate the resolution process.

If I stop and think about this problem, I am essentially working with numerous variables that can all play a role in this mismatch event. However, not every variable has the same level of importance. For example, if I locate an approved authorization that has the same provider, same drug, and same member, but the wrong date, that’s a great candidate for further exploration. In contrast, an authorization with all of this information, but is not approved, is a weaker one.

What we have essentially are search tiers (showing 3 for simplicity):

Tier 1 - Strongest candidate match
Tier 2 - Medium candidate match
Tier 3 - Weakest candidate match

And since our job is to identify as many possible candidate matches as possible, we’ll find that each candidate match will be associated with one of these tiers (they are mutually exclusive). Thus, we also need to equate each tier with a numerical score: (for example)

Tier 1 - Strongest candidate match - Score: 100
Tier 2 - Medium candidate match - Score: 75
Tier 3 - Weakest candidate match - Score: 50

And once we’ve identified a set of viable candidate matches and their scores, we can formulate a final summary.

This approach relates very closely to an optimization technique known as Progressive Constraint Relaxation (PCR) which:

… is used to solve complex, tightly constrained problems by gradually relaxing or loosening constraints over time. This approach transforms infeasible solutions into feasible ones in early stages to explore the search space, then gradually tightens constraints to ensure high-quality, feasible final solutions.

So, what does this mean in practical terms for our missing authorization? We could employ the following search tiers, which are organized by the strength of evidence. In essence, and this is critical, we’re encoding domain knowledge into this search tiers to ensure they’re adding value (e.g., member and drug matter more than provider).

Tier 0 - Best possible candidate (score: 100)

same member + same drug + valid window + approved authorization

Tier 1 - Relax the date (score: 90)

same member + same drug + wider window

Tier 2 - Relax drug (score: 80)

same member + related drug + valid window

Tier 3 - Relax member ID (score: 70)

alternate member ID + same drug + valid window

Tier 4 - Focus on the provider (score: 70)

same provider + same drug + same date + different member

Tier 5 - Look for non-approved authorizations (score: 50)

same member + same drug + valid window + non-approved authorizations

To summarize, for each claim, we’re going to search for candidate authorizations. For each candidate, we’ll assign a score based on our layered search algorithm, and offer a final (deterministic) summary. We’ll then pass along this collection of information to the LLM so that it can employ reasoning to explain what we found and provide final guidance:

Example: No approved authorization was found for the resolved member and DrugZ. The closest candidates were authorizations for the same drug and date window, but they belonged to different enterprise members. This suggests the issue is less likely to be a date-window problem and more likely to involve either missing authorization intake, member identity resolution, or incorrect claim/member attribution.

Recommended next action: verify whether the member has alternate IDs not present in the current crosswalk, then search pending/denied/cancelled authorizations for DrugZ around the service date.

With this expanded understanding, the corresponding algorithmic steps are:

Step 0: Start with the mismatched claim (e.g., CLAIM_005 has no corresponding authorization).
Step 1: Normalize inputs (e.g., member_id crosswalk).
Step 2: Identify candidate authorizations (1 or more).
- Tier 1 → Expand the date window
- Tier 2 → Relax drug
- Tier 3 → Relax the member identity
- Tier 4 → Inspect provider clues
- Tier 5 → Inspect non-approved authorizations
Step 3: Score each candidate.
Step 4: Classify the overall outcome.

To accelerate progress, I’ll ask ChatGPT to create new synthetic datasets to aid in development:

Existing (enhanced)

claims.csv
authorizations.csv
member_crosswalk.csv
providers.csv
drug_reference.csv

New datasets

authorization_events.csv → lifecycle tracking
member_aliases.csv → identity ambiguity

I’ll begin by examining these datasets in Colab and then start building the search logic. After several hours of experimentation, I structured my code as follows:

Data Exploration
Step #1: Show claims with valid auth matches.
Step #2: Show claims without matching authorizations.
For each of these claims: (manually edited for now)
- Tier 1: Relax the date.
- Tier 2: Relax the drug.
- Tier 3: Relax the member ID.
- Tier 4: Focus on the provider.
- Tier 5: Look for non-approved authorizations.
Generate deterministic summary.
Generate LLM summary (handled manually for now).

Here are the results:

Claim ID: CLM007

The summary of the tiered search results for CLM007 indicates that no matching authorizations were found across any of the tiers (Tier 1: Same Member + Same Drug, Tier 3: Alternate Member ID, Tier 4: Normalized Provider/Drug/Date, and Tier 5: Non-Approved Authorizations). This means that for CLM007, even after relaxing several matching criteria, a corresponding authorization could not be identified within the dataset for enterprise member EM007 and drug NDC777. This particular claim appears to genuinely lack a matching authorization under the defined search logic.

Claim ID: CLM010

The summary of the tiered search results for CLM010 indicates that no matching authorizations were found in Tier 1 (Same Member + Same Drug), Tier 4 (Normalized Provider/Drug/Date), and Tier 5 (Non-Approved Authorizations). However, alternate member IDs were found in Tier 3 (Alternate Member ID) for enterprise member EM010.

Claim ID: CLM011

The tiered search for CLM011 (enterprise member EM011, drug NDC111) yielded the following results:

Tier 1 Results (Same Member + Same Drug): A matching authorization (AUTH014) was found for EM011 and NDC111, which was approved. This indicates a direct potential match within a relaxed date window.

Tier 3 Results (Alternate Member ID): Alternate member IDs were found for EM011, specifically ‘C1011’ as a Primary alias.

Tier 4 Results (Normalized Provider/Drug/Date): An authorization (AUTH001) was found for the normalized provider ‘P101’ and drug ‘NDC111’. However, this authorization is for enterprise member ‘EM001’, not ‘EM011’, indicating a potential match for the provider and drug but with a different member.

Tier 5 Results (Non-Approved Authorizations): No non-approved authorizations were found for EM011 and NDC111.

In conclusion, for CLM011, a strong candidate authorization (AUTH014) was identified in Tier 1 based on the same member and drug, suggesting that it might be a valid match once date window and approval status are thoroughly checked. Additionally, Tier 3 confirmed member aliases, and Tier 4 identified an authorization matching provider and drug but for a different member.

This is good progress. I’ll stop here for today. Tomorrow I’ll revisit the search logic and make a few adjustments. For example, currently I’m manually changing the claim_id variable to proceed through the search tiers. Instead, I need to move all tier logic into separate functions, and then implement a for loop to call each function for each claim.

As a separate function, I’ll then introduce a call to OpenAI to programmatically obtain the final LLM guidance. That will conclude this phase of the project. It’s only after this point where I’ll consider expanding what I’ve accomplished.