The brief arrives. Two systems need to talk to each other. "It's just connecting A to B. Shouldn't take long." The estimate goes in at a week, maybe two. The person writing it has done this before and knows better, but the optimism feels defensible at the time.
Then A speaks SOAP and B expects REST. The shared authentication certificate expired two years ago and the person who managed it left the company. The data format on one side hasn't been touched since 2014, and it shows — fields with names like cust_ref_nr_2 and amounts stored as strings with the currency symbol embedded. The other side needs real-time responses, but the only available interface polls every 30 minutes and batches the results.
What looked like connecting pipes turns into archaeology.
The Archaeology Problem
Most systems don't document what they actually do. They document what they were supposed to do when someone wrote the spec. The gap between those two things grows every year, every undocumented hotfix, every edge case that got handled with a condition nobody thought to mention.
Before you can integrate two systems, you have to understand both of them: not at the level of the API documentation, but at the level of how they behave under real conditions. What does A actually send when a transaction partially succeeds? What does B do when it receives a duplicate message? What happens to the queue if the downstream service is slow to respond?
These questions rarely have written answers. You find out by testing, or by talking to the engineers who've been closest to the system longest, or by observing what happens when things go wrong in production.
That discovery process is what the week-two estimate didn't account for. It's also where most integration projects spend most of their time.
The Boundary Questions That Actually Matter
The connection itself — the HTTP call, the message put onto the queue, the file dropped in the folder — is the easy part. What requires real thought is the boundary: where A's responsibility ends and B's begins, and what happens in the gap between them.
Some of the questions worth asking before writing a line of code:
- What happens when A sends data faster than B can process it? Is there a queue? Who owns it? What's the back-pressure strategy?
- Who owns retry logic when a transaction succeeds on A's side but the acknowledgement from B never arrives? Both systems think they've done their job. Nobody has.
- What does "success" actually mean at the integration boundary? Delivery? Processing? Confirmation that the downstream effect happened?
- How do you test this when neither system has a reliable staging environment that reflects production data volume?
- If the integration breaks at 2am, how does anyone know? How do they trace which message failed and why?
These are architecture questions dressed up as connectivity problems. Get them wrong and you end up with an integration that works fine in testing, misbehaves occasionally in production, and fails silently when it matters most.
What Middleware Does — and Doesn't Do
Message queues, API gateways, event buses, adapter layers — middleware exists to answer the boundary questions. Queues decouple producers from consumers so A doesn't need to know whether B is available. Gateways enforce rate limits and authentication so neither system needs to manage those concerns. Adapter layers convert formats so A and B can each speak their native language without knowing the other exists.
That's genuinely useful. But middleware doesn't fix a poorly thought-out boundary. It formalises it.
If the decision about who owns retry logic was wrong, putting a queue in the middle doesn't fix it. It just makes the wrong decision more durable.
This is where integration projects that were "almost done" stay almost done for months. The queue is in place. The adapter layer is running. Messages are flowing. But something is wrong: duplicates appearing, orders getting dropped in edge cases, the occasional silent failure that only surfaces when a customer calls. The architecture looks correct. The logic underneath it isn't.
Middleware is a multiplier. It amplifies good boundary decisions and makes bad ones harder to fix.
Starting With Failure Modes
The teams that get integrations right don't start with the API contract. They start with the failure modes. The design conversation begins with: what can go wrong here, and what should happen when it does?
Connection drops mid-transfer. Data arrives but doesn't validate. Upstream system sends a duplicate. Downstream system processes the message but can't send a confirmation. Clock skew causes events to arrive out of order. Volume spikes beyond what the receiving system can handle in real time.
Each of those scenarios needs a defined response before the integration is built, not discovered during an incident. The response might be a dead-letter queue and an alert. It might be idempotency keys so duplicates are harmless. It might be a circuit breaker that degrades gracefully instead of failing hard. The specific answer matters less than having one.
If you can't answer "what happens when X fails" for the ten most likely failure modes, you haven't designed the integration. You've connected the wires and hoped for the best.
That's not plumbing. Plumbing has standards, tolerances, and a well-understood failure mode for every fitting. Most software integrations don't start with any of that. The discipline has to be built in deliberately, which is why integration work that looks simple almost never is.
If you're scoping an integration project and want a second opinion on the boundary design before the build starts, we're worth talking to. Most integration problems are cheaper to fix in the design phase.
Start a conversation →