Day-by-day log of a diligence GPT build for a mid-market PE deal team. Started Monday, shipped to production the following Tuesday.
Day 1: Kickoff workshop with the deal team. Pulled the last six CIMs, three management presentations, and four QoE reports they had reviewed. Identified the seven questions every analyst asked of every document. The seven questions became the prompt scaffolding.
Days 2 to 3: Built the retrieval layer over the document corpus. Postgres plus pgvector, no fancy framework, no LangChain. The ingestion script is sixty lines. The retrieval query is twenty.
Day 4: First eval harness pass against eight known-answer questions from the historical CIMs. Six out of eight passed without prompt tuning. The two failures were both about footnote-buried adjustments. Spent the afternoon reading footnotes and adjusting the prompt to handle them.
Day 5: Internal demo. Two analysts ran it against a CIM the team had reviewed the prior week. The tool surfaced two adjustments the team had missed and one the team had wrong. That demo earned the production deployment greenlight.
Days 6 to 7: Slack integration, audit log, basic UI for the analysts who did not want to use Slack. Shipped Tuesday end of day. The bigger lesson: the eval harness is what made this a one-week ship, not the model or the framework.
