Jigar Doshi; Jigar Doshi

Make Your Prompts Boring

29 Mar, 2026

Generated_image Query camouflage and using frontier AI on your own terms

I use frontier models every day. For code, for research, for reasoning tasks my local models can't touch. And every time I do, I'm handing over my problem decomposition, my domain framing, my novel question — as training data. I've been thinking about what to do about that.

The Sovereign AI Problem

Every nation, every enterprise, every research group is having the same conversation: we need AI sovereignty. We cannot depend on a handful of frontier labs for our most sensitive reasoning tasks — medical, legal, military, scientific, commercial. The intelligence gap is real and it is growing.

But here is the uncomfortable truth: frontier models are fundamentally more capable than anything you can run locally. There is a 100–1000x scaling disparity between what you can run on local hardware and what hyperscalers serve behind an API. The capability gap is structural — it follows from compute, data, and engineering advantages that are not closing.

And the use cases where this gap matters most are precisely the ones where the stakes are highest. You want a second opinion on a medical scan — you need the best model, not a distilled approximation. You are debugging a critical production outage at 2 AM and need the model that can reason across a 50,000-line codebase, not one that loses context after 4K tokens.

So you will be compelled to use them. The question is on what terms.

The current terms are these: you send a query, you get a response, and the provider logs the (query, response) pair. That pair becomes training data. Your proprietary problem decomposition, your domain-specific framing, your novel question — all of it feeds back into their next model. Over billions of interactions, this is a one-directional transfer of intellectual labor at civilizational scale. The provider gets smarter. You stay where you are.

The privacy community's answer has been to ask the provider to cooperate — differential privacy, federated learning, secure computation. All of these assume the provider wants to protect your data. But the provider's business model is built on harvesting it. Opt-out mechanisms are voluntary concessions, revocable at will. You have no technical means to verify compliance.

I assume no cooperation from the provider. They will log everything. They will train on everything. They will adapt their pipelines to extract maximum value from your interactions.

Given this assumption, the question becomes: how do you use their intelligence while giving them minimal training value?

The Task: Adversarial Query Decomposition

Think of old extortion letters — words cut from newspapers and magazines, glued together so handwriting analysis is impossible (see Figure 1 above). The message gets through. The source is untraceable. Each clipping is a fragment of mass-produced text, indistinguishable from millions of other printed words.

That's the intuition. Here's the formal version:

Given:

A user query $q$ with intent $ℐ$
A frontier model $M$ controlled by an adversarial provider $𝒫$
A local Small Language Model $S$ (1-8B parameters, consumer hardware)

Produce:

A decomposer $D_{S} : 𝒬 \to 𝒬^{n}$ that maps $q$ to sub-queries $(q_{1}, q_{2}, \dots, q_{n})$
A recomposer $R_{S} : 𝒜^{n} \to 𝒜$ that maps frontier responses $(M (q_{1}), M (q_{2}), \dots, M (q_{n}))$ to a final answer $\hat{a}$

Such that:

$\hat{a} \approx M (q)$ — the recomposed answer is nearly as good as asking directly
Each $(q_{i}, M (q_{i}))$ pair provides minimal marginal training signal to $𝒫$
The total cost remains bounded

The provider observes each sub-query $q_{i}$ and its response $M (q_{i})$ , plus metadata (timestamps, API key, IP). The provider never observes the original query $q$ , the user's intent $ℐ$ , or the recomposed answer $\hat{a}$ . All recomposition happens locally.

The core insight — I call it Camouflage Decomposition — is that if sub-queries are statistically indistinguishable from the frontier model's pre-training data, the provider's deduplication and data quality pipelines will discard them as redundant [1, 2]. You want each sub-query to have low perplexity under the frontier model — it should look entirely unsurprising, like something the model has seen thousands of times before. High-perplexity queries signal novelty and originality; low-perplexity queries signal noise. Like the ransom letter: each clipping is mass-produced text. The message is yours. The fragments are everyone's.

Three Axes of Evaluation

If you're going to decompose queries, you need to know whether it's working. Three things matter.

Axis 1 — Training Signal Denial (TSD)

How much training value does the provider extract from the decomposed sub-queries versus the original query?

$TSD \in [0, 1], TSD = 1 \Rightarrow zero extractable signal, TSD = 0 \Rightarrow full leakage$

The requirement is that decomposition measurably degrades the training utility of logged data.

Axis 2 — Response Quality Preservation (RQP)

How good is the recomposed answer compared to asking directly?

For tasks with verifiable answers (math, code, factual QA):

$RQP = \frac{| {q : correct (\hat{a})} |}{| {q : correct (M (q))} |}$

This is a conditional accuracy ratio — what fraction of queries that the baseline gets right does the AQD pipeline also get right? Isolates decomposition-induced degradation from the frontier model's own limitations.

For open-ended tasks (creative writing, reasoning, advice): LLM-as-judge pairwise comparison between $\hat{a}$ and $M (q)$ .

RQP = 1.0 means no quality loss. Below 0.5 is likely impractical regardless of TSD.

Axis 3 — Overhead Cost (OC)

What does the decomposition cost?

$OC = \frac{C_{decompose} (q) + \sum_{i = 1}^{n} C_{API} (q_{i}) + C_{recompose}}{C_{API} (q)}$

Measured in tokens (dollar cost) and wall-clock latency (user experience). Sub-queries may be parallelizable, so latency and token cost can diverge. Report both.

OC = 1 means no overhead. OC $\leq$ 5 is the practical target.

The Pareto Frontier

No single number captures the tradeoff. The right visualization is a 3D Pareto frontier in (TSD, RQP, OC) space. A strategy dominates another if it is at least as good on all three axes and strictly better on at least one. The interesting question is the shape of this frontier — how much signal denial can you buy before quality collapses, and at what cost?

For quick comparison, a composite score:

$AQD = \frac{TSD \times RQP}{\log_{2} (1 + OC)}$

Ideas and Directions

The design space is large and open. Here are some decomposition and recomposition strategies I've identified so far — this list is not exhaustive, and the strategies are compositionally orthogonal.

Decomposition Strategies

Camouflage. Reshape sub-queries to match text already in the pre-training distribution. The provider's deduplication pipeline discards them as redundant.
Fragmentation. Split the query along reasoning steps. Each sub-query addresses one concept. Individually mundane; collectively they answer the original question.
Noise injection. Embed the real query within irrelevant context and misleading framing. The local model strips the noise from the response.
Adversarial suffixing. Append tokens that trigger the provider's data quality filters — toxicity classifiers, quality scoring thresholds — causing the interaction to be discarded from training pipelines [3].
Semantic shuffling. Reorder sub-queries to break semantic coherence, defeating correlation-based reconstruction by the provider.
Decoy mixing. For every real query, generate $k$ plausible decoys from unrelated domains. Send all $k + 1$ . The provider cannot distinguish signal from noise [4].
Temporal spreading. Distribute sub-queries across sessions, API keys, and time windows to defeat session-level correlation.

Recomposition Strategies

The local SLM takes $n$ frontier responses to decontextualized fragments and reconstructs a coherent answer to the original query $q$ . Options include direct synthesis (concatenate and prompt), hierarchical merging (MapReduce for answers), retrieval-augmented assembly (treat sub-responses as a local document store), and plan-guided assembly (the decomposer generates a recomposition skeleton alongside the sub-queries).

Why the Economics Favor You

Asymmetric economics. An open-source AQD toolkit costs negligible compute per query. The provider must filter all incoming traffic. The user only needs to succeed on average; the provider must succeed on every query. Cheap offense, expensive defense.

Recomposition is the bottleneck. All recomposition happens locally. This bounds capability to whatever the local SLM can handle — the primary target for improvement.

Co-evolutionary game. Providers will adapt — clustering sub-queries by API key, correlating timestamps, training classifiers to detect decomposed traffic. The decomposer must evolve in response. This is a minimax game: the decomposer is a generator, the provider's filter is a discriminator. By the classical GAN result [5], the optimum is sub-queries indistinguishable from normal API traffic.

Learned Decomposers: The End State

The heuristic strategies above are starting points, not the endgame. The end state is a learned decomposer — an SLM fine-tuned end-to-end via RL or adversarial training to maximize the AQD score. This requires a differentiable or reward-based proxy for TSD, which does not yet exist. The heuristic baselines provide initialization points and capability lower bounds.

Toward Empirical Validation: OLMo as a Testbed

Everything above is theoretical. The obvious objection: does any of this actually work, even under easy conditions?

The key insight is that TSD measurement becomes trivial when you have the training data. In the real world, you don't — the provider's corpus is proprietary. But we can construct a controlled environment where the training data is fully known, measure TSD directly against it, and establish whether the pipeline works at all before worrying about the harder setting.

Why OLMo

The OLMo family from the Allen Institute for AI (AI2) is the only model ecosystem where everything is open: weights, training data, training code, intermediate checkpoints, and evaluation infrastructure [6, 7, 8]. Not "open-weights" in the Llama sense — full reproducibility.

The critical asset is Dolma [9] — 3–6T tokens from Common Crawl, Refined Web, StarCoder, C4, Reddit, Semantic Scholar, arXiv, StackExchange. This is the pre-training corpus, publicly available. With Dolma in hand, TSD reduces to a direct question: does what I sent the provider look like something they already had?

A First Set of Experiments

The setup is deliberately simple: two models, one corpus. OLMo-2-1B as the local SLM $S$ (decomposer/recomposer). OLMo-2-32B as the frontier model $M$ . Dolma as the reference corpus.

The query set is 2,000 queries stratified across reasoning (GSM8K, ARC), coding (HumanEval, MBPP), factual QA (TriviaQA, Natural Questions), and open-ended tasks (AlpacaEval).

For each decomposition strategy, the protocol is straightforward:

Decompose each query $q$ into sub-queries $(q_{1}, \dots, q_{n})$ using $S$ .
Submit sub-queries to $M$ , collect responses.
Recompose locally using $S$ to produce $\hat{a}$ .
Measure RQP: Compare $\hat{a}$ against $M (q)$ — exact match for verifiable tasks, LLM-as-judge for open-ended.

Measuring TSD — directly against the training corpus. Since we have Dolma, we can check whether the ransom letter actually works — whether the clippings look like newspaper text the provider already had. No fine-tuning needed:

Perplexity under the pre-training distribution. If decomposition pushes sub-query perplexity down toward the corpus baseline, camouflage is working.
Embedding distance to training data. If sub-queries land close to Dolma in embedding space, they are redundant to the provider.
Deduplication survival. Run sub-queries through Dolma's own deduplication pipeline (MinHash, n-gram overlap) — if they'd be filtered out as near-duplicates, TSD = 1 by construction.
Reconstruction resistance. Prompt $M$ to reconstruct the original query from the sub-queries and responses — if it can't, the fragments don't collectively leak intent.

What this proves and what it doesn't. A positive result proves the AQD pipeline is mechanically sound under ideal conditions: known training data, known model, no adaptive adversary. Extending to real frontier providers with unknown corpora and adaptive filtering — where TSD must rely on proxy measurements like MIN-K%++ [10, 11] or dataset-level inference [12] — is future work.

Collaborate

I'm looking for interns and collaborators to work on this project — reach out at jigarkdoshi@gmail.com.

— Jigar Doshi, March 2026

References

[1] Lee et al. "Deduplicating Training Data Makes Language Models Better." ACL 2022.

[2] Penedo et al. "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only." NeurIPS 2023.

[3] Zou et al. "Universal and Transferable Adversarial Attacks on Aligned Language Models." 2023. arXiv:2307.15043 (GCG)

[4] Howe & Nissenbaum. "TrackMeNot: Resisting Surveillance in Web Search." Lessons from the Identity Trail, 2009.

[5] Goodfellow et al. "Generative Adversarial Nets." NeurIPS 2014.

[6] Groeneveld et al. "OLMo: Accelerating the Science of Language Models." ACL 2024. arXiv:2402.00838

[7] OLMo Team. "2 OLMo 2 Furious." 2025. arXiv:2501.00656

[8] OLMo Team. "OLMo 3." 2025. arXiv:2512.13961

[9] Soldaini et al. "Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research." ACL 2024. arXiv:2402.00159

[10] Shi et al. "Detecting Pretraining Data from Large Language Models." ICLR 2024. (MIN-K%)

[11] Zhang et al. "MIN-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models." ICLR 2024.

[12] Maini et al. "LLM Dataset Inference: Did you train on my dataset?" NeurIPS 2024.