Building an AI-Powered Transcript-to-Comic Mini App with Crypto Payments

Project Details

Industry

Crypto / Web3

Engagement Type

AI Consulting / Software Consulting

Full-stack Engineer, UX designer

~ 1 month

Building an AI-Powered Transcript-to-Comic Mini App with Crypto Payments

Post by Miloš Smiljanić

Reading time: 5 minutes

A Web3 technology company approached us with an ambitious objective: they wanted to showcase the capabilities of their new transaction protocol through a real product experience rather than a static technical demo.

The concept was simple but powerful. Users would be able to transform transcripts – such as earnings calls or hearings – into AI-generated comic strips inside a Mini App, then publish and share them online. At the same time, the application needed to demonstrate the protocol’s broader ecosystem, including search, image generation, and wallet-based payments.

The Challenge

This project combined several layers of complexity.

The SDK was still under development, meaning endpoints and documentation changed frequently and introduced integration risk. At the same time, the application required orchestration across multiple systems, including search, LLM processing, image generation, persistence, and payments – all within a user flow that needed to feel instant.

Key risks included:

breaking API changes during development
latency across chained AI operations
payment friction for users unfamiliar with crypto wallets

To address these, we focused on architecture, UX transparency, and progressive interaction design.

What We Built

We delivered a public Mini App that guides users through a multi-stage workflow while keeping the experience intuitive and fast.

Users can:

search or paste transcripts
generate structured comic concepts panel by panel
produce images for each panel
assemble and publish a comic to a public feed

Optional paid upgrades include HD export, watermark removal, and additional panels, unlocked through wallet sessions integrated directly into the Mini App.

Despite the complexity behind the scenes, the full process from transcript input to shareable comic typically completes in one to two minutes.

Architecture and Approach

The solution was built using Next.js 15 and Tailwind CSS for the frontend and orchestration layer, with PostgreSQL storing comic metadata and Render supporting deployment infrastructure. The protocol SDK powered transcript search, image generation, and wallet payments, while OpenAI handled concept generation.

A critical architectural decision was introducing an orchestration adapter layer that insulated application logic from SDK volatility. Instead of directly calling SDK endpoints, the UI relied on stable internal functions that could be remapped whenever APIs changed. Feature flags further enhanced flexibility.

The AI workflow itself was structured rather than purely generative. Transcript segments were converted into deterministic JSON describing panels, captions, and scenes, allowing consistent rendering, retries, and improved observability.

Streaming played an important role in user experience. By implementing server-sent events for both concept and image generation, users could watch each panel appear in real time, significantly reducing perceived latency.

Client (Next.js App)
├─ /api/search              → ATXP.Transcripts.search(query)
├─ /api/concept [SSE]       → OpenAI: transcript → comic JSON (panels[])
├─ /api/images [SSE]        → ATXP.Images.generate(panel.prompt)
├─ /api/publish             → save comic + panels (PostgreSQL), return public slug
└─ /api/payments            → ATXP/Base wallet session (optional paid features)

PostgreSQL
├─ users
├─ comics (id, user_id, title, transcript_ref, status, cost_cents, created_at)
└─ panels (comic_id, index, caption, prompt, image_url, duration_ms)

ATXP SDK
├─ Transcripts.search
├─ Images.generate
└─ Payments/session

Render (Hosting & Deployment)
├─ Web service (Next.js)
└─ DB service (Postgres managed)

Implementation

Development followed an iterative process that prioritized speed without sacrificing stability.

The implementation included:

early repository setup with mock data to unblock frontend work
transcript search integration through protocol tools
LLM prompt engineering to generate structured comic concepts
image rendering per panel
persistence and publication to a public feed
wallet session integration for optional paid features

Collaboration was equally important. Daily async updates and a dedicated Slack channel enabled rapid clarification cycles and quick responses to SDK changes, keeping delivery on track despite shifting dependencies.

Deployment leveraged preview environments for pull requests and zero-downtime production releases, supported by health checks and environment-level feature flags.

Reliability, Security, and Observability

Because transcripts can contain sensitive information, privacy and reliability were built into the architecture from day one.

The solution introduced:

server-side masking of PII patterns
transcript truncation and avoidance of raw transcript persistence
compact concept storage instead of original text
correlation IDs and per-stage latency tracking
retries with backoff for transient failures
idempotent image rendering
defined timeout budgets across pipeline stages

Accessibility and sharing were also addressed, with auto-generated alt text and keyboard-friendly streaming updates, along with built-in sharing to social platforms and a public comic feed.

Results

The project delivered a fully functional Mini App deployed to production that successfully demonstrated protocol orchestration in a live environment.

Performance metrics from the first weeks showed strong engagement and reliability:

median time to first panel: ~22 seconds
median time to publish (4 panels): ~78 seconds
image retry rate: <3% with automatic recovery
paid feature attach rate: 15–25% depending on cohort

Beyond performance, the application provided the client with a compelling demonstration tool combining AI generation, wallet payments, and social sharing in a single cohesive experience.

Client Feedback

Although no formal testimonial was provided, internal feedback emphasized the fast rendering of the first panel, the smooth sharing flow, and the effectiveness of the app as a demonstration of protocol capabilities. The client described the application as being “broadly in good shape.”

Miloš Smiljanić

A photo of the author of this post, smiling with his hands on his hips.

Miloš Smiljanić

As a Delivery Manager at BlueGrid.io, I specialize in Customer Support, NOC, DevOps, and Project Management, with a strong focus on building reliable operational processes and high-performing teams. My role spans end-to-end delivery - from managing technical support and infrastructure-focused teams to leading development projects from initial planning through execution and launch.

I'm known as a bit of a jack of all trades, master of none - which in practice means I'm comfortable jumping between technical, operational, and organizational challenges and connecting the dots where needed.

Outside of work, I'm a big fan of both PC and board games, and I enjoy spending time gaming with friends whenever I get the chance.