All posts
DevelopmentMarch 12, 20267 min read

Building Cross-Platform AR: One Codebase for Android, iOS, and Meta Quest

How VisionGuide deploys the same AR-guided workflow to phones, tablets, and XR headsets from a single codebase — and why that matters for hardware companies.

Siva, App Developer at VisionGuide

When we started building VisionGuide's mobile app, we had a choice: build three separate apps for Android, iOS, and Meta Quest — or find a way to share one codebase across all three. We chose the harder path, and it's paying off for every customer we onboard.

Here's why cross-platform AR is harder than regular cross-platform mobile development, and how we solved the key challenges.

Why Cross-Platform AR Is Different

Cross-platform mobile development is a well-solved problem. Frameworks like React Native and Flutter let you build iOS and Android apps from one codebase with minimal platform-specific code. But AR adds layers of complexity that these frameworks weren't designed for:

Camera access is platform-specific. Each OS handles camera permissions, frame capture, and image processing differently. Android's Camera2 API, iOS's AVFoundation, and Meta Quest's passthrough camera have completely different interfaces and capabilities.

3D rendering pipelines differ. Android uses OpenGL ES or Vulkan. iOS uses Metal. Meta Quest uses its own rendering pipeline optimized for stereoscopic display. A 3D model that renders perfectly on one platform may look wrong or perform poorly on another.

Spatial tracking varies wildly. ARCore (Android), ARKit (iOS), and Meta's Insight SDK each have different approaches to understanding the physical environment. Accuracy, latency, and supported features differ across platforms.

Hardware capabilities are uneven. A flagship Samsung phone, an iPad Pro, and a Meta Quest 3 have very different processors, cameras, and rendering budgets. The same AR experience needs to adapt to all of them.

Our Architecture

Rather than using a cross-platform framework and fighting its limitations, we built a shared core with platform-native shells:

The Shared Core

The core logic — hardware recognition, workflow execution, step validation, analytics — lives in a shared layer that runs identically on all platforms. This core:

  • Processes camera frames through the recognition pipeline
  • Manages workflow state (which step, what to show, what to validate)
  • Handles overlay positioning and content rendering
  • Communicates with the backend for workflow updates and analytics

This shared core represents about 70% of the total codebase. When we fix a bug in workflow execution or improve the recognition algorithm, the fix applies everywhere.

Platform-Native Shells

The remaining 30% is platform-specific code that handles:

  • Camera capture — native code for each platform's camera API
  • 3D rendering — platform-optimized rendering for overlays
  • Spatial tracking — integration with ARCore, ARKit, or Insight SDK
  • UI patterns — following each platform's design conventions

My colleague Logesh and I split the work: I focus primarily on the Android implementation while he handles iOS and Meta Quest. Because the shared core defines clear interfaces, we can work in parallel without stepping on each other's code.

The Real Challenge: Consistency

Building on three platforms is one thing. Making the experience feel identical is another.

When a service manager creates a repair workflow in VisionGuide's web editor, they expect it to look and behave the same regardless of what device the technician uses. "Step 3: Remove the fuser assembly" should highlight the same component, show the same instruction, and validate the same action on an Android phone, an iPad, and a Meta Quest headset.

Achieving this consistency required solving several problems:

Coordinate Alignment

Each platform's spatial tracking system has its own coordinate origin and orientation. A 3D overlay positioned at coordinates (0.5, 1.2, -0.3) on Android might appear in a different position on iOS because the coordinate systems don't align.

We solved this by defining our own coordinate space anchored to the recognized hardware. Once the system identifies the machine, it establishes a reference frame based on the physical device — not the platform's world coordinates. All overlay positions are relative to this reference frame, making them platform-independent.

Performance Budgets

A Meta Quest 3 needs to render at 90 FPS (45 per eye) to avoid motion sickness. An Android phone runs at 30-60 FPS. An iPad Pro can sustain 60 FPS easily.

We implemented adaptive quality scaling — the system automatically adjusts 3D overlay complexity based on the device's rendering budget. On a Quest headset, overlays are simpler but always smooth. On a powerful tablet, they can include more detail. The information content is identical; only the visual fidelity adapts.

Input Handling

On a phone, the technician taps the screen to advance steps or confirm actions. On a Meta Quest, they use hand tracking or controller buttons. On a tablet, they might use a stylus.

The workflow engine doesn't care about input method — it receives "confirm" or "back" signals from the platform shell. Each platform translates its native input into these universal signals.

Why This Matters for Customers

The cross-platform approach has practical benefits that directly affect our customers' operations:

Technicians use what they have. Not every field technician has the same device. Some carry Android phones, others have iPads from their company's IT fleet, and specialized teams might use Meta Quest headsets for hands-free work. One workflow serves all of them.

Headset adoption is gradual. Most organizations start with phones (zero hardware cost — technicians already have them) and explore headsets later for specific use cases like hands-free repair or immersive training. Our platform supports this migration path without rebuilding workflows.

Updates are instant everywhere. When a service manager updates a procedure in the web editor (built by our platform engineer Akash), the change deploys to every device simultaneously. No separate app updates, no version mismatches, no "the Android version has the new procedure but iOS doesn't yet."

Lessons Learned

After building and shipping cross-platform AR for hardware guidance, a few lessons stand out:

Don't abstract too early. Our first attempt tried to create a universal AR abstraction layer that hid all platform differences. It was elegant in theory but leaked platform-specific behavior constantly. The current approach — thin native shells with a shared core — is less elegant but much more maintainable.

Test on the worst device, not the best. Our reference test device is a 3-year-old mid-range Android phone. If the experience is smooth there, it works everywhere. Testing only on flagship devices creates a false sense of performance.

Camera quality varies more than you think. The same machine photographed with different phone cameras can look surprisingly different. Our recognition pipeline needed extensive training on diverse camera inputs — different resolutions, color temperatures, noise levels, and lens distortions.

Offline-first is non-negotiable. Field technicians often work in places with poor connectivity — inside buildings, in basements, in remote facilities. The entire AR experience, including the 3D model and workflow data, must work without an internet connection. Syncing happens when connectivity is available.

The Meta Quest Factor

XR headsets deserve special mention because they fundamentally change the interaction model. On a phone, the technician holds the device with one hand and works with the other. On a Quest headset, both hands are free.

This changes what's possible:

  • Two-handed procedures that require holding a component while performing an action become practical with AR guidance
  • Persistent overlays stay visible without the technician needing to aim a phone
  • Larger visual context — the headset's wider field of view shows more of the machine simultaneously

We're seeing early adoption of Quest headsets in training scenarios (where both hands need to be free for practice) and in maintenance situations where the equipment requires two-handed work. As headset prices continue to drop and comfort improves, we expect this to become the primary form factor for AR-guided repair within 3-5 years.

Related Reading

Tags

cross-platformAndroidiOSMeta QuestAR developmentmobile AR

Ready to transform your hardware experience?

See VisionGuide in action.

Get Started