Lessons from an AI-Assisted Content Migration

Lessons from an AI-Assisted Content Migration


Discussion of AI is all around us, but in my experience, practical guidance rooted in specific use cases is surprisingly rare. After spending months deep in the weeds of a massive documentation migration with AI as my assistant, I’ve learned some hard-won lessons that I think others could benefit from. 

If you work in content engineering, technical documentation, or are simply curious about how AI holds up in a complex, real-world project, here’s my take on what worked and what didn’t.

Project Context

I’m a DITA Information Architect on the Information Experience team at Splunk. DITA, short for Darwin Information Typing Architecture, is an open, XML-based standard for structuring and managing technical content. 

We recently wrapped up the migration of three large documentation sites into a single help portal, powered by a DITA-based component content management system (CCMS). The timeline was tight, and nearly all of the resources were internal. The migrations were complex and significant to the business, requiring careful planning and execution.

I originally planned only to support the migration of the smaller, unversioned site. When that went well, I was asked to lead the much larger second migration. (The third site was handled by another team.) Together, these two migrations meant grappling with roughly 30,000 HTML files, two very different site architectures, and the challenge of customizing an existing Python migration script to fit the content at hand, while also putting processes in place for writers to review and clean up their content.

I want to be clear that AI did not complete this project for me. It enabled me to work faster and more efficiently, though only while I did the planning, architecting, and troubleshooting. Used effectively, AI became a power tool that dramatically sped up delivery, but it never replaced the need for expertise or oversight.

Throughout this project, I used the then-current GPT-4 models through an internal Cisco chat-based deployment. These days, I work more in editor-based tools such as GitHub Copilot. Still, the lessons I learned should apply to the present (mid-2025) state of the art, with a few caveats that I mention where relevant.

How I used AI effectively

Prompting

One lesson I learned early on was to treat prompts the way I approach technical documentation: clear, consistent, and comprehensive. Before consulting the AI, I’d sketch out what needed to happen, then break it down into granular steps and write a prompt that left as little to the imagination as possible. 

If I wasn’t sure about the solution, I’d use the AI as a brainstorming partner first, then follow up with a precise prompt for implementation.

Iterative development

The migration automation wasn’t a single script but became a suite of Python tools that crawl navigation trees, fetch HTML, convert to DITA XML, split topics into smaller units, map content, and handle version diffs. Each script started small, then grew as I layered in features.

I quickly learned that asking AI to rewrite a large script all at once was a recipe for bugs and confusion. Instead, I added functionality in small, well-defined increments. Each feature or fix got its own prompt and its own GitLab commit. This made it easy to roll back when something went sideways and to track exactly what each change accomplished.

Debugging

Even with good prompts, AI-generated code rarely worked perfectly on the first try – especially as the scripts grew in size. My most effective debugging tool was print statements. When the output wasn’t what I expected, I’d sprinkle print statements throughout the logic to trace what was happening. Sometimes I’d ask AI to re-explain the code line by line, which often revealed subtle logical errors or edge cases I hadn’t considered.

Importantly, this wasn’t just about fixing bugs, it was also about learning. My Python skills grew immensely through this process, as I forced myself to really understand every line the AI generated. If I didn’t, I’d inevitably pay the price later when a small tweak broke something downstream.

These days, I lean on an AI-powered integrated development environment (IDE) to accelerate debugging. But the principle is unchanged: don’t skip instrumentation and verification. If the AI can’t debug for you, fall back on print statements and your own ability to trace the problem to its source. And always double check any AI-generated code.

AI as an implementer, not inventor

This project taught me that AI is fantastic at taking a well-defined idea and turning it into working code. But if you ask it to design an architecture or invent a migration strategy from scratch, it will probably let you down. My most productive workflow was to (1) design the process myself, (2) describe it in detail, (3) let the AI handle the implementation and boilerplate, and (4) review, test, and refine the AI output.

Version control

I can’t stress enough the importance of version control, even for simple scripts. Every time I added a feature or fixed a bug, I made a commit. When a bug appeared days later, I could walk back through my history and pinpoint where things broke. Sure, this is basic software engineering, but when you’re working with AI, it’s even more critical. The velocity of change increases, and your own memory of each modification is inevitably less exhaustive.

The net effect of these practices was speed without chaos. We delivered far faster than we could have otherwise, and the quality of the output significantly reduced post-migration cleanup.

Where AI fell short

As valuable as AI was, it had many shortcomings. The cracks started to show as the scripts grew in size and complexity:

  • Context limits: When scripts got longer, the AI lost track of earlier code sections. It could add new standalone features, but integrating new logic into existing, interdependent code? That often failed unless I spelled out exactly where and how to make changes. I should note that today’s newer models with larger context windows might reduce some of the issues I ran into with the migration scripts. But I suspect that it’s still important to be as specific as possible about what sections need to be updated and with what logic.
  • Failure to find a working implementation: I found that sometimes the AI simply couldn’t solve the problem as outlined in the prompt. If I asked for a change and it failed three or four times, that was usually a signal to step back and try something different – whether that meant prompting for an alternate approach or writing the code myself.
  • System understanding: Certain bugs or edge cases required a solid understanding of our systems, like how the CCMS handles ID values, or how competing case sensitivity rules across systems could trip things up. This is a crucial area where AI could not help me. 

What I’d do differently next time

Here’s my advice, if I had to do it all over again:

  • Plan core libraries and conventions early: Decide on your stack, naming schemes, and file structure at the outset and include them in every prompt. Inconsistencies here led to time wasted refactoring scripts midstream. That said, working in an editor-based tool that’s aware of your entire pipeline will help to keep your libraries consistent from the outset.
  • Sanitize everything: File names, IDs, casing, and other seemingly minor details can cause major downstream problems. Include this guidance in your prompting boilerplate.
  • Account for custom content: Don’t assume all docs follow the same patterns and definitely don’t assume the AI understands the nuances of your content. Find out early where the outliers are. This upfront work will save you time in the long run.
  • Document the complex stuff: For any logic that takes more than a few minutes to understand, write down a thorough explanation you can refer back to later. There were times I had to re-analyze complicated parts of the scripts weeks later, when a detailed note would have set me back on course.

One non-AI tip: keep copies of your source and converted markup in a repository even after uploading the converted content to your production tooling. I promise that you’ll need to refer back to them.

AI as a partner, not a replacement

Reflecting on the project, I can emphatically say that AI did not replace my critical thinking. Instead, it amplified my skills, helping me work at a speed and scale that would have been difficult to achieve alone, while streamlining the post-migration cleanup. But anytime I leaned too heavily on AI without careful planning, I wasted time and had to backtrack.

The real value came from pairing my domain knowledge and critical thinking with AI’s ability to iterate quickly and implement. Used thoughtfully, AI helped me deliver a project that became a career milestone.

If you’re facing your own daunting migration, or just want to get more out of AI in your workflow, I hope these lessons save you some pain, and maybe even inspire you to take on a challenge you might have thought was too big to tackle.

 

Find more stories on our Innovation channel and subscribe here!



Source link