ARKit (iPhone face tracking) gives you 52 blendshapes vs ~10 from typical webcam tracking. The result: micro-expressions like raised eyebrows, lip corners pulling, and authentic surprise faces — things webcam VTubers can't do.
If you VTube with a webcam, you've probably noticed your character's face looks "okay" but never quite ALIVE. There's a reason: webcam-based face trackers (OpenSeeFace, MediaPipe) detect maybe 10-15 facial movements. Apple's ARKit on iPhone X and newer detects 52.
What ARKit actually tracks
The 52 ARKit blendshapes cover micro-movements webcams miss entirely:
- Inner brow raise (the surprised "oh!" face)
- Outer brow raise (the questioning lift)
- Brow squeeze (concentration)
- Cheek puff (annoyance / chubby cheeks)
- Lip corner pull L+R (asymmetric smile)
- Lip funnel (kissy face)
- Tongue out (yes, anime VTubers love this one)
- Eye look up/down/left/right (independent per eye)
- ...and 40+ more granular controls
Webcam trackers usually do: blink, mouth open, head turn, head tilt. That's about 8-10 axes vs 52.
Why this matters for retention
Viewers stay tuned when a streamer is expressive. Subtle eyebrow lifts during excited moments, lips parting before a laugh, eye darts when reading chat — these are unconscious cues that humans pick up. When your Live2D character DOESN'T do them, you feel "off" without knowing why.
The retention difference between basic-webcam and ARKit-rigged VTubers in long streams is measurable. Streams over 90 minutes see 15-30% better viewer retention with proper ARKit rigs vs webcam-only.
Setup: cheapest path to ARKit tracking
- iPhone X or newer (X, XS, XR, 11+, 12+, 13+, 14+, 15+, 16+) — refurbished iPhone X starts around $80 in 2026
- VTube Studio app on iPhone ($10 one-time)
- Phone tripod / desk mount at eye level, 30-50 cm away
- WiFi connection on same network as your PC
- Live2D rig with ARKit blendshape mapping — most premium rigs include this; basic rigs don't
Without an iPhone: webcam ARKit alternatives
If you don't have an iPhone, three webcam options approximate ARKit but never match it:
- MeowFace (Android) — uses Android face mesh, ~30 blendshapes. Better than webcam, worse than iPhone.
- iFacialMocap PC app + good webcam — extracts ~20 blendshapes from a quality webcam (Logitech Brio recommended). Decent.
- OpenSeeFace — open source, free, ~12 axes. Good baseline.
Does YOUR rig support ARKit?
Open VTube Studio with your model. Settings → Tracking → if you see "ARKit Blendshapes" with most checkboxes mappable, your rig supports it. If you only see basic blink/mouth toggles, the rigger didn't do the ARKit work.
Most riggers charge $50-150 extra to add ARKit support to an existing rig — easier and cheaper than reordering. Or commission an ARKit-ready rig from the start.
What AnimArts ships
Every Live2D rig from Standard tier and up includes full 52-blendshape ARKit mapping by default. We test it with the actual VTube Studio iPhone app before delivery — no "oh, you need to remap that" surprises.
If you bring an existing rig that lacks ARKit, we offer ARKit retrofit at $80 for most rigs.
Get an ARKit-ready Live2D commission →
Bottom line
An iPhone X + ARKit-rigged Live2D model is the single biggest "looks alive" upgrade you can give your VTuber persona. Total cost (used iPhone + rig retrofit) starts around $200. The retention boost on long streams pays it back in 1-2 months for most monetised VTubers.
Ready to Get Started?
Get a personalized quote for your project. We respond within 24 hours.