Elm Media Control - API Proposal

After a really intense discussion with Richard Feldman and the awesome crew at Elm Philadelphia last night, we came up with an alternative media API that’s a big departure from anything. I’ve tried before. I wanted to present it here for feedback:

The basic concept is you have an opaque type called Id that you create with a task:
create : Source -> Config -> Task Error Id

Source would be some kind of source, like a Url (or a list of fallback Urls). Config would look something like this: {autoplay : Bool, muted: bool, loop: Bool}.

Of course, media has regular state updates, such as the position in the media while it’s ended, so you would subscribe to those changes with:
state : Id -> (State -> msg) -> Sub msg).

State being a record representing media state.

The state can be set with:
changeSetting: Id -> Media.Setting -> Task Error State
(It returns the new state after setting the properties)

This means that you can set your state easily in your update function:
PlayButtonClicked -> (model, Task.attempt StateUpdated (setState [play]))

For audio, that’s basically it. The AudioElement isn’t located in the visible DOM, it just plays through your speakers.

For video, we have to actually position the video image somewhere so we need one more function:
video : Media.Id -> List (Html Attribute) -> Html msg
which lets you position and style the video image in the DOM.

There’s some deeper configuration stuff to be dealt with, such as adding subtitle text tracks, and creating a stateWithOptions subscription, but this is basic shape of the thing.

Benefits

  1. Much, much, much easier to keep media state in synch.
  2. There can be no confusion between which media element you’re attempting to play/pause/etc.
  3. Lets us handle a lot of the million and one edge case weird policy differences between browser media APIs as an implementation detail, automatically for users of the library.
  4. Maps nicely to media formats on other platforms, if/when Elm is available there. In AVFoundation on iOS, for instance, this api can work pretty much the same, just with slightly different state types.
  5. Better Error handling and easier to make impossible states impossible
  6. It really works well in the context of the rest of the media APIs. For Media Stream Capture, for instance, we just need to add a captureSource: Config -> Source function, and use it to create a function that we set the source of our player. In Web Audio, we can create a sourceNode like so: audioSourceNode : Id -> AudioNode.
  7. Substantially simplifies the eventual implementation of live streaming/adaptive bitrate.

TL;DR

The API for media playback, in broad strokes, basically looks likes this:

create : Config -> Task Error Id

changeSetting : Id -> Setting -> Task Error State

state: Id -> (State -> msg) -> Sub msg

video : Id -> List (Html Attribute) -> Html msg

EDIT: Simplifying further by putting Source into the Config record in create: Config -> Task Error Id

FURTHER EDIT: Renaming setState to changeSetting

15 Likes

My first Gist is here, it’s a very simple Audio Player using this API design:
Simple Audio Player Gist

2 Likes

Nice! I like it.

As we spoke of in slack, there’s a slight modification that might make things nicer

Instead of

type Setting
    = Play
    | Pause
    | Seek Float
    | SetSource Source
    | Mute
    | Unmute
    | SetVolume Int
    | Loop
    | NoLoop
    | Autoplay
-- and
changeSetting : Id -> Setting -> Task Error State

You could just expose

play : Id -> Task Error State
pause : Id -> Task Error State
seek : Float -> Id ->  Task Error State
setSource : Source -> Id -> Task Error State
mute : Bool -> Id -> Task Error State
setVolume : Int -> Id -> Task Error State
loop : Bool -> Id -> Task Error State
autoplay : Id -> Task Error State

and keep Setting and changeSetting hidden.

Which means you could add to Setting type without having a breaking change. You lose pattern matching on Setting, but I’m not sure we want that as an external thing anyway.

Things on the API end would look a little cleaner:

Media.play africaByToto
    |> Task.andThen (\_ -> Media.loop True africaByToto)
    |> Task.attempt UpdatedMedia

Compared to

Media.changeSettings thriller Media.Play
    |> Task.andThen (\_ -> Media.changeSettings thriller Media.Loop)
    |> Task.attempt UpdatedMedia
4 Likes

Matt, I really like this suggestion and thought that the authors of libraries abstracting over it would do that, but the future compatibility aspect of it is a really good point. Plus there really is no reason ever to be pattern matching on it as a consumer of the type.

1 Like

This looks really nice :slight_smile:

One comment/query I have is that it seems to totally hide from the user how (and more importantly when) a global AudioContext is acquired. Is this intentional? The idea being that the library would try to create a context immediately and then opportunistically attempt to resume it behind the scenes? If not I think the documentation would need to make clear which operations (i.e. create or just a changeSetting of Play) trigger the creation of an AudioContext so that the consumer of the API knows that they will only succeed on some browsers if issued in response to a click or keypress, and also to provide some kind of function in the API to try to manually resume a suspended context.

To give a concrete example, I have been making several simple webgames with Elm, currently using ports for audio. The problem I have found myself working around is that the latest behaviour of the Chrome browser means that if a player has already visited my site before then I can start the game’s audio straight away (background music, audio effects for animation on screen, etc). But if a player has not visited the site before then all audio will be muted, even audio started after they have interacted with the page unless I specifically resume the AudioContext in direct response to an interaction (e.g. mouse click). Having something that abstracts this away so I don’t have to deal with it would be ideal, but otherwise having a way to manually request a resume of the AudioContext is really desirable for webgames, so that at least for players who have visited before the audio can be started immediately.

I realise this is quite a minor detail compared to the basic structure of the API, which looks great. But just wanted to raise the issue that for some use cases having API access to make a resume call to the underling AudioContext is really important (so as to allow optimistically starting audio early and recovering later if the browser doesn’t at first allow it).

1 Like

I’m laughing to myself because you’re getting a bit ahead of this this API proposal. This proposal is around audio and video playback, not manipulation, generation, etc via web audio, but web audio has been on my mind, and I plan to post about it shortly (and discuss it in my Oslo Elm Day talk in 3 weeks).

The audio generated through this API can be used as a source node inside an AudioContext, but this API is not yet concerned with AudioContexts. But do not fear, because that’s the next step, and I think this proposal serves it well.

This is a very new policy, but there’s a similar policy for plain audio/video playback that’s a little bit older, and there’s a much older, more restrictive (but more consistent) policy in Safari, and my guess is that Firefox will follow suit, probably in the next 12 months. It’s very annoying for developers, and it’s something that can make this stuff hard to debug if you don’t know about it, but from a user’s perspective, it’s the right policy I believe.

And I have really good news for you: this design addresses it. I may be able to do more, but at a minimum I can return an Error if you try to play and there hasn’t yet been user interaction.

The web audio specific policy is so new (5 weeks-ish) that I have to go research whether it’s per AudioContext or per load, but if it’s per AudioContext, a similar design for initiating web audio will work really well.

I don’t think it’s minor AT ALL. This is the single most frequent issue I encounter helping out developers who don’t spend as much time in Media API as I do. And I’ve been working hard for months to try to develop a Task, as opposed to Ports driven proposal precisely because the Media API is filled with edge-case gotchas that are really more about familiarity with the API than about talent, and I want to handle the Errors so that people aren’t staring at their screen wondering why their video isn’t playing.

3 Likes

Oops, sorry for jumping the gun! But great to hear that you’re on top of these issues too :slight_smile: Looking forward to hearing about it in the future…

Some ideas on two fronts. First of all, what would y’all think of renaming Media.Id to Media.Key, reflecting the naming scheme of Browser.

On another front:

Having played with this, I think this Opaque type technique is the way to go. Not just for these settings, but also for Source, because it lets us expand this API in the future. For instance, we could easily add Capture Streams like this:

Task.attempt UpdatedMedia <| changeSource (captureStream 
        [ video [ width (exact 1920), height (exact 1080)] ])

Similarly, we can create single source media like so:

changeSource (source (url "https://myfile.mp3") |> mp3))

or create an set of fallback urls like this:

changeSource (fallbacks [ (url "https://myfile.aac") |> aac
                        , url "https://myfile.mp3"]  

Thoughts?

What does Id represent? Neither Id nor Key really give me any intuition what this thing is?

It’s an opaque type representing a reference to the actual mediaElement object, which is managed by the runtime. There can be multiple of them. I am definitely open to suggestions for a better name.

Maybe just call it “Media?”

Naming isn’t a big deal at this stage.

This API is pretty neat looking! But what happens when one Media.Id is used in multiple video tags? Do they all show the same video in the same images and audio? Is that possible? If so, can you give an idea of the JS code that could make that work?

Yes, the audio only has one destination, the video is a different story.

This is an extremely unusual use-case, where you would want two copies of the same video on a page, but I can imagine a couple of niche scenarios where it might be useful, such as making a video mixer.

What I’ve done in this scenario in the past is use canvas. The HtmlVideoElement works extremely well with canvas.

NOTE: This example will only work on Firefox and Chrome because of the format of the video I used; if swapped for an mp4, would work fine on other browsers. (WikiCommons was just hosting one of my favorite public domain films already).

const video = document.createElement("video");
video.width = 320;
video.height = 240;
video.src = "https://upload.wikimedia.org/wikipedia/commons/a/a2/Le_Voyage_dans_la_Lune_%28Georges_M%C3%A9li%C3%A8s%2C_1902%29.ogv";

const videoClone = document.createElement("canvas");
videoClone.width = 320;
videoClone.height = 240;

video.addEventListener("loadeddata", drawVideoClone);
video.addEventListener("playing", drawVideoClone);

document.body.appendChild(video);
document.body.appendChild(videoClone);

const playBtn = document.createElement("button");
playBtn.appendChild(document.createTextNode("Play");
playBtn.onclick = play;

const pauseBtn = document.createElement("button");
pauseBtn.appendChild(document.createTextNode("Pause");
pauseBtn.onclick = pause;

document.body.appendChild(playBtn);
document.body.appendChild(pauseBtn);

function drawVideoClone() {
    let ctx = videoClone.getContext('2d');
    ctx.drawImage(video, 0, 0, videoClone.width, videoClone.height);
    if (!video.paused || !video.ended) {
        window.requestAnimationFrame(drawVideoClone);
    }
}

function play() { video.play(); }
function pause() { video.pause(); }

Using canvas is also making me think there’s future possibilities to add cool abstractions. For instance, a video filters API, which lets users manipulate the video data on a per-frame basis, to do things like green screening or color correction. EDIT: Almost certainly better to do this filter API by creating a mechanism to pass a video as a WebGL texture…then you can do any processing you want in WebGL.

One issue with the above method is that it doesn’t copy visible subtitles:

However, subtitles are exposed as string, we can draw them ourselves (or the author of the application can, which simplifies the api for enabling/disabling them). This also means we can have a universal look and feel across browsers.

After some reflection, I want to give a different answer. I can’t think of any cases where this is desirable.

The two cases I was thinking of, writing video editing software and/or a live video mixer, are almost certainly better served by copying the video to a canvas, or even better, to a webGL texture.

Multiple different videos per page, sure, there are a number of subreddits like that, but the same video, playing simultaneously? I can’t think of any use-cases (which doesn’t mean they don’t exist).

One usecase that came to mind is having a fullscreen blurred version behind a smaller copy of the same version in order to mitigate bad resolution. But there might be more elegant solutions.
Edit: typos

This is a good example; it’s helping me clarify some of my thinking.

I think in order to blur it you’d have to use something like canvas to implement this anyway. I actually think the scenario Evan laid out is impossible in regular Html and JavaScript anyway.

From a creative perspective, I’d argue that changing the size of the element would be better than the blur technique, which is generally used to mix different aspect ratios into a single video.

Another way to replicate them is to clone the original media object and then match their settings, but perfect synchronization would be an issue, and caching of media might be on some browsers.

tl:dr - this clarifies it for me; I think the answer is, it’s not something that exists in JavaScript, the number of use cases are small, better to be able to bring the video into elm/webGl as a texture anyway.

I think maybe this is asking less “can we do this?” or “why would we do this?” but more that the API allows it and therefore what would/should the result be (e.g. if you slip up and reuse an ID do you end up with a mangled page or can/should the API account for that somehow?).

I could be wrong but that question occurs to me as well so I’m curious what you think :smile:

1 Like

Yeah, I’ve been thinking about this too, I think it’s a question of philosophy/policy. There are two obvious possibilities:

  1. Duplicate the object, which is possible in at least one way illustrated above, and probably more.

  2. Not display anything (or an empty div). Give lots and lots and lots of documentation warnings about this.

Linear types would come in handy here…

I’m leaning towards duplication, but I also think people shouldn’t do it, as an aesthetic choice

I think duplication is a valid approach for sure, possibly even naively by just having multiple <video>s with the same configuration, rather than trying to mirror it via <canvas> etc.

The trick in that case would be making sure that the state-altering functions apply to every <video> with the given ID (which sounds like pretty simple book-keeping, but I know nothing about Elm internals).

When dealing with media, synchronization is always the challenge, and has been for over a century.

But beyond keeping them in-sync, you also don’t want the user to reload the video twice, as these can be large. But this may be a solvable problem. I’m going to run some tests. If they can use the same srcObject, this may be easier to solve.

PS: One piece of related-film history–It’s often stated that synchronization was the reason that sound film didn’t become mainstream until 1927, but synchronization was solved early on. The real problem was amplification, and thus “talkies” and computers have a common technological enabling technology: the vacuum tube.