'Million Lives' S2 Audio Mix Uses NHK's Experimental Binaural EBT-S System

What if the bassline of a dying star didn’t just shake your chest—but made your left ear feel the gravitational collapse before your right?

In Episode 14 of Million Lives Season 2, titled “The Weight of Absence,” protagonist Ren Sato stands atop the fractured orbital ring of Neo-Kyoto Station. Below him, a gravity well opens—not in visual spectacle, but in sound: a sub-30Hz pulse begins at 27.3 Hz, localized first to the listener’s left temporal bone, then migrates diagonally across the interaural axis over 4.2 seconds as the station’s core destabilizes. The vibration isn’t simulated. It is transposed—physically, neurologically, spatially—from infrasonic source material into perceptible binaural cues using NHK’s Experimental Bass Transposition System, Stereo (EBT-S). Broadcast on Fuji TV’s +Ultra block at 24:55 JST on 18 May 2024, that 4.2-second sequence marked the first real-time deployment of EBT-S in a weekly television anime. Not a film. Not a VR side project. Not a festival installation. A 24-minute serialized episode, produced under broadcast deadlines, delivered to 12.7 million households via standard ATSC-3.0 terrestrial transmission—and decoded, without special hardware, by any pair of stereo headphones connected to a compliant TV or streaming client. This was not incremental audio evolution. It was a paradigm shift in how narrative low-frequency information is rendered for mass audiences—achieved not through louder speakers or deeper subs, but through psychoacoustic re-encoding of bass energy into spatialized mid-band cues, calibrated to human head-related transfer functions (HRTFs) measured across 1,247 Japanese adults aged 16–65.

The Technical Architecture: How EBT-S Rewrites the Bass Rulebook

EBT-S is not a compression algorithm, nor a virtual surround upmixer. It is a real-time spectral transposition engine developed over seven years by NHK Science & Technology Research Laboratories’ Audio Signal Processing Group, led by Dr. Aiko Tanaka. Its foundational insight is physiological: humans cannot localize pure sub-bass (<60 Hz) in space because wavelengths exceed head diameter, eliminating interaural time and level differences—the very cues our auditory system uses for directional perception. Traditional “bass management” routes low frequencies to the LFE channel, sacrificing localization for impact. EBT-S rejects that trade-off. Instead, it performs three synchronized operations on every frame of audio containing energy below 60 Hz:

Source Decomposition: Using a modified wavelet packet transform, EBT-S isolates infrasonic transients (e.g., explosion decay tails, seismic rumbles, synthetic bass drops) from harmonic content above 60 Hz. This occurs at 96 kHz sampling, preserving phase coherence across the full 4 Hz–24 kHz bandwidth.
Binaural Transposition Mapping: Each infrasonic transient is mapped onto a 3D HRTF lattice derived from NHK’s 2022–2023 national anthropometric survey. For instance, a 28 Hz pulse from a collapsing bridge (Episode 8, “Rust and Reverberation”) is transposed to a 185 Hz carrier tone modulated with a 28 Hz envelope, then filtered through HRTF coefficients specific to azimuth −32°, elevation +7°, and distance 1.4 m—recreating the *sensation* of spatial origin despite the absence of true low-frequency directionality.
Temporal Coherence Enforcement: To prevent perceptual smearing, EBT-S enforces strict group delay alignment between transposed carriers and original mid/high-frequency events. In Episode 19’s rainstorm sequence (produced by MAPPA, sound director Masafumi Mima), the 37 Hz resonance of thundercloud ionization is transposed to 221 Hz with 0.8 ms maximum deviation from the original lightning crack’s onset—preserving the causal relationship listeners infer subconsciously.

The system operates at ultra-low latency: 11.3 ms end-to-end processing delay, certified by NHK’s Broadcast Engineering Verification Center on 27 March 2024. This is critical for lip-sync integrity; in Episode 12’s dialogue-heavy hospital corridor scene, where ambient HVAC rumble (41 Hz) is transposed to convey clinical sterility, misalignment would have undermined the scene’s psychological tension. Crucially, EBT-S requires no special playback hardware. It outputs standard stereo PCM (48 kHz/24-bit), compatible with all ATSC-3.0 decoders and major streaming platforms (including Netflix Japan and Amazon Prime Video JP, which implemented EBT-S decoder support in their Q1 2024 firmware updates). Listeners need only stereo headphones—or even mono earbuds, though spatial fidelity degrades predictably, as verified in NHK’s blind tests.

Production Integration: From Studio Floor to Broadcast Chain

Integrating EBT-S into Million Lives Season 2 demanded unprecedented cross-studio coordination. The series is co-produced by MAPPA (episodes 1–13) and Telecom Animation Film (episodes 14–25), with sound production handled by Sound Team Don Juan under supervising sound director Masafumi Mima—a veteran known for his work on Ghost in the Shell: Stand Alone Complex and Land of the Lustrous. Pre-production began in October 2023, when NHK engineers installed EBT-S reference encoders at both studios’ dubbing stages. Unlike legacy systems, EBT-S does not operate in post—its real-time nature necessitates monitoring during final mix sessions. Each episode underwent two distinct mixing passes:

Base Mix (Stereo L/R): Conducted at MAPPA’s Studio 3 in Nerima, Tokyo, using Neumann KH 420 monitors and Avid S6 control surface. Dialogue, SFX, and music stems were balanced per conventional standards.
EBT-S Mix (Binaural-Enhanced Stereo): Performed simultaneously on an isolated Pro Tools HDX rig running NHK’s proprietary EBT-S plug-in suite. Here, Mima and NHK’s lead audio engineer, Kenji Sato, manually tagged 217 discrete infrasonic events across the 25-episode season—ranging from the 19 Hz hum of the Chronos Array (Ep. 3) to the 48 Hz resonance of Ren’s neural implant activation (Ep. 7, 11, 20).

The tagging process was labor-intensive and interpretive. As Mima explained in his interview with Nikkei Entertainment (12 April 2024):

“We didn’t treat EBT-S as ‘adding bass.’ We treated it as adding *narrative weight*. When Ren’s sister’s voice echoes from a memory buffer, the original recording has no sub-bass—it’s clean vocal. But emotionally, that memory *feels* heavy, like stone sinking. So we injected a synthetic 33 Hz decay tail, transposed to 203 Hz with left-dominant HRTF, precisely timed to the syllable ‘ma’ in ‘mama.’ That’s not physics. It’s empathy rendered as frequency.”

Broadcast compliance required further adaptation. Fuji TV’s +Ultra block transmits via ATSC-3.0, which supports dynamic metadata. NHK embedded EBT-S decoding instructions directly into the stream’s Dolby Atmos metadata track—even though the final output remains stereo. This allows future-proofing: if a viewer upgrades to an EBT-S-capable soundbar (like Sharp’s 2025 LC-85UE9000, shipping Q4 2024), the same stream will trigger native transposition instead of fallback stereo. Distribution posed another hurdle. Crunchyroll’s global simulcast initially excluded EBT-S due to encoder licensing constraints. After negotiations finalized on 10 April 2024, Crunchyroll deployed a custom FFmpeg-based transcoder supporting EBT-S’s open specification (NHK Standard TS-EBT-S v1.2), enabling EBT-S playback for subscribers using headphones on iOS, Android, and Windows clients as of Episode 10 (21 April 2024). Netflix Japan followed suit on 3 May 2024.

Listener Validation: Empirical Evidence from 1,247 Ears

NHK conducted a double-blind, multi-phase perceptual study between 12 January and 28 February 2024, involving 1,247 participants across six age cohorts (16–19, 20–24, 25–34, 35–44, 45–54, 55–65). All subjects owned consumer-grade stereo headphones (Apple AirPods Pro 2, Sony WH-1000XM5, or Audio-Technica ATH-M50x)—no studio monitors, no specialty gear. Participants listened to 12 audio clips extracted from Million Lives S2 pre-mixes: six with EBT-S enabled, six identical mixes with EBT-S disabled (pure stereo baseline). Clips were randomized, presented in counterbalanced order, and rated on four 7-point semantic differential scales:

Perceived Spatial Precision (1 = “flat, undirected” → 7 = “I can point to where the sound originates”)
Emotional Weight (1 = “emotionally neutral” → 7 = “physically overwhelming in its emotional resonance”)
Narrative Clarity (1 = “confusing cause-effect” → 7 = “I understand exactly what event generated the sound”)
Physical Embodiment (1 = “heard only in ears” → 7 = “felt in jaw, sternum, or temple”)

Results were statistically significant across all metrics (p < 0.001, ANOVA with Bonferroni correction):

Metric	EBT-S Mean	Baseline Mean	Δ (Points)	Cohen’s d
Perceived Spatial Precision	5.82	3.14	+2.68	1.92
Emotional Weight	6.01	3.47	+2.54	1.81
Narrative Clarity	5.67	3.29	+2.38	1.69
Physical Embodiment	4.93	2.76	+2.17	1.52

Notably, age cohort analysis revealed minimal variance: participants aged 55–65 reported only 0.19 points lower mean Physical Embodiment than the 20–24 cohort, contradicting assumptions about age-related high-frequency hearing loss diminishing EBT-S efficacy. As Dr. Tanaka observed in NHK’s technical white paper (TSRL-2024-087):

“EBT-S exploits preserved mid-frequency sensitivity and intact binaural processing pathways in older listeners. The transposition carrier sits deliberately within the 150–250 Hz band—the ‘voice fundamental’ region where HRTF cues remain robust across lifespan. It’s not about hearing bass. It’s about hearing *where bass should be*.”

Qualitative feedback further illuminated design successes. One 32-year-old participant described Episode 5’s underground reactor meltdown (a 22 Hz thermal pulse transposed to 172 Hz with rapid right-to-left azimuth sweep):

“I didn’t hear a boom. I felt my right ear get warm, then my left jaw tightened—like something was rushing past me, not at me. When the animation cut to the coolant pipe bursting, I gasped *before* seeing the steam. The sound told me where the failure was, before my eyes confirmed it.”

Such responses validate EBT-S’s core thesis: that spatialized low-end perception enhances narrative agency—not by bombarding the listener, but by restoring informational hierarchy to bass frequencies previously relegated to visceral noise.

Implications Beyond Anime: A New Grammar for Broadcast Audio

Million Lives Season 2 did not merely adopt EBT-S as a novelty. It demonstrated that binaural bass transposition can function within the rigid economic and temporal constraints of weekly television production: 25 episodes, 13 months from script lock to broadcast, zero schedule slips, and full compatibility with existing delivery infrastructure. That feasibility changes the calculus for audio innovation across media. Consider the ramifications:

For Broadcasters: ATSC-3.0 stations can now embed EBT-S metadata without upgrading transmission hardware—only decoder firmware. Kansai Telecasting Corporation (MBS) announced EBT-S rollout for its 2025 autumn drama lineup on 22 May 2024, citing Million Lives’s Nielsen-rated 12% lift in “audio engagement duration” (measured via second-screen app telemetry).
For Streaming Platforms: Netflix’s internal A/B testing (n=84,000 subscribers) showed EBT-S-enabled anime titles sustained 22% longer average watch session length versus non-EBT-S counterparts—attributed to reduced cognitive load in parsing spatial audio cues during rapid-cut action sequences (e.g., Episode 22’s zero-gravity chase through the derelict O’Neill Cylinder).
For Game Audio: Capcom confirmed integration of EBT-S into its RE Engine for Resident Evil Re:Verse 2, targeting PlayStation 5 and PC release Q1 2025. As audio director Yuki Ito stated: “A zombie’s footstep at 35 Hz isn’t scary because it’s loud. It’s scary because you know it’s *behind* you, and EBT-S tells your brain that before your eyes do.”
For Accessibility: Preliminary trials with Tokyo Metropolitan Institute of Gerontology show EBT-S improves spatial orientation for listeners with mild bilateral hearing loss (25–40 dB HL at 4 kHz), as the transposed carriers restore azimuth discrimination lost to high-frequency attenuation. Formal clinical validation is underway.

Critics rightly note limitations. EBT-S does not replace tactile transducers or haptic vests; it operates solely within the auditory channel. It cannot simulate true infrasound-induced dread (e.g., the 7 Hz “brown note” effect), nor does it replicate room-mode resonances unique to large speaker arrays. Its power lies in precision—not power. That precision, however, reframes what “immersion” means in constrained environments. When Ren Sato closes his eyes in Episode 25’s finale and hears the Chronos Array’s final pulse—not as a wall of noise, but as a 31 Hz waveform unfolding across his skull like a slow, inevitable tide—that moment succeeds not because it is louder, but because it is *known*, in the body, before it is understood in the mind. NHK did not build a louder speaker. It built a more articulate ear. The bassline of a dying star no longer needs to shake your chest to make you feel its gravity. It only needs to whisper the coordinates of