The newest EEG Video live, free-to-attend webinar is now available!
On May 20, 2021, we presented Live Q&A: Everything You Want to Know About Captioning. Bill McLaughlin, VP of Product Development for EEG, and Matt Mello, Sales Associate for EEG, fielded audience questions about closed captioning, accessibility, and more.
Live Q&A: Everything You Want to Know About Captioning • May 20, 2021
Watch this webinar for expert captioning insights! EEG solutions covered include:
- Lexi Automatic Captioning
- Falcon Live Streaming RTMP Encoder
- iCap Translate
- Alta Software Caption Encoder
- HD492 iCap Encoder
- iCap Connect AV650
EEG Video is your source for the latest closed captioning news, tips, and advanced techniques. Visit here to see all previous webinars!
Regina: Hello everybody and thank you so much for tuning in to EEG's webinar today, Live Q&A: Everything You Want to Know about Captioning. My name is Regina Vilenskaya and I'm the Director of Marketing here at EEG. With me on this webinar are Matt Mello and Bill McLaughlin. Matt is the Sales Associate and Bill the VP of Product Development here at EEG. They will both be answering your questions today. With that, I would like to welcome the captioning experts of today's event, Matt Mello and Bill McLaughlin. Welcome!
Matt: Hi everybody!
Matt: Thanks so much for joining us today. I see a lot of questions already coming in and we have a lot to get through, so let's just jump straight into the first question that we have here, which is:
What is closed captioning?
Matt: So closed captioning - really starting base level here. So closed captioning is kind of like a blanket term for anything that allows the viewer at home to enable or disable the captions. So basically, if you're at home and you have your remote and you have the CC button, you turn it on and you see words pop up on the screen, that's closed captioning. Same thing with a video player of any kind there, where you select if you want it on or off. That's actually in opposition to open captions, which is when it's actually part of the video on-screen and you don't have a choice whether it's on or off. So that's kind of the general what is closed captioning.
Bill: Yeah, absolutely, and you're gonna be merging that with a video signal of some sort, and that's kind of where the crux of a lot of today's products come in. How do you get that on the video in a format that you need to use and get it all the way to your viewers at the other end?
Matt: Alright, so the next question is:
What's the difference between closed captions and subtitles?
Matt: So closed captions, like I was saying, is when it's part of the video feed, like Bill said, and it's something that you can enable or disable, where I think the distinction with subtitles is it's similar to open captions, I think, helps some people discern the difference. It's sometimes used interchangeably, but they're different usually and closed captions is enabled or disabled and subtitles is on-screen.
Bill: Yeah, and usage from that can vary a little bit. I mean, in more of a UK English sense, sometimes people use subtitles really interchangeably with the way Americans use closed captioning. But yeah, broadly, “closed captions” is probably a good phrase to make it clear when you're looking to have it embedded in the video and at the viewer’s control.
Matt: Alright, so our next question is:
What's the difference between live captioning and adding captions to pre-recorded material?
Matt: So live captioning is kind of a specialized field where there's actually somebody or something on the other end of the audio listening and creating live captions on the fly that's embedded into the video and then pushed out to wherever it's going. The difference there is that you're going to see a delay in live captioning a lot of the times. There's a noticeable difference between what's being said and what you're seeing in captioned material, whereas pre-recorded material, it's created beforehand, you can embed it in-line with the video, and it's totally different workflows for how each works.
Bill: And we got some feedback after our last webinar, that we called it 101 but we started at the graduate level, and so hopefully a lot of people just getting started with captioning there can just be a lot of questions where, “We're looking to do something for accessibility. I got tasked with this. What are the choices? What are the different options?” And so hopefully some of this can help frame the problem a little bit. Is the video live? Is it not live? What format is the video in? Are you gonna have closed captions? Are you gonna have open captions? Or you can have a separate companion web page or app that has text only, a separate scoreboard or video monitor that has text only at a live event, and that's the kind of thing that can help you frame, like to really then get into where some of, honestly, probably our other presentations have started, which is which EEG product is most suitable for that.
Matt: Right, so if you're a returning viewer then thank you for rejoining us here. OK, next question is:
What's the benefit of relying on human captioners versus automatic?
Matt: So there's kind of different tiers of how you get your live captioned content. You could think of the bottom tier as like - or the base tier as free automatic closed captioning, like sometimes I think YouTube has an integration with that or Webex. The free stuff is at the bottom, usually, and the quality is usually not going to be there, but it's one of those things where you get where you pay for. Another automatic solution but the next tier up would be a smart ASR which our Lexi service that allows you to kind of curate what you want it to. It allows you to input what is going to be said so you can have a better idea of of what's going to be captioned. And then your next tier up is the human captioners, the people who are actually listening and contextually kind of pick out different information that a machine might not be able to just yet. And there's also other things to consider, like budgets. There's budgets - obviously it's a huge thing with live captioning, is having a big or small budget for each project. Scheduling is another huge thing, like with Lexi you don't need to schedule anything with an - with an out-of-the-box solution you don't need to schedule anything. Obviously a human captioner needs to know beforehand that they're going to be on this, so there's a lot of things to consider when you're going with a human versus automatic.
Bill: Yeah, and human captioners do a lot of prep, too. That's something that not everybody who buys captioning services realized, but there's definitely a process of, What is this program going to be about? What's the specialized vocabulary in it? And that's something that human captioners, if they're doing a program for the first time, have to put a lot of prep into. And really with automatic it's the same. It's maybe more prep, but certainly the same prep that you need to say if what's going to be talked about here that is going to be surprising or specialized to somebody who would have no idea when they walked into my event who these people are, what these things are, what we're talking about, anything like that. That's specialized information for a human or an automatic solution you're going to want to do some prep on.
Matt: Right. Alright, the next question is:
I want to caption my videos and how do I get started?
Matt: So that's kind of a big, broad generalization. There's a lot that can go into that because it's, Is it pre-recorded content? Is it live content that is your videos? But to get started, the easiest way would kind of - if you have no idea coming into this what you need, feel free to contact us. We'll be able to help you. If it's live content, we absolutely specialize in live content. If it's not live, we also have pre-recorded programs that we can do to create captions. So it's kind of a big term, there's a lot that goes into captioning but we can definitely simplify it and kind of walk you through anything that you need. If you see at the bottom right there, there's our Sales email address. Feel free to contact us with any questions like this related to your workflow. OK, the next question is:
I'm looking for live Spanish captioning. Is that possible with any EEG solutions?
Matt: The answer is generally yes, most of the time. Spanish is one of the big languages that's commonly requested and it's a very common ask of people for translation in Spanish, live Spanish captioning that's the program audio is in Spanish, so it's a big ask. But yes, we do support live Spanish translation can be done with iCap Translate and Lexi or human captioner and iCap Translate so generally yes, Spanish is one of the languages that's most easily supported next to English and French in the EEG solutions.
Bill: Yeah, and in our Lexi Core Models, that's actually Spanish is the only language other than English that we offer some pre-curated models in for Spanish language news programming, so you can use that in addition to any customer models if it's anything that's news- and current affairs-related. So yeah, it's pretty much just for the Spanish captioning, I think the biggest questions we'll have for a customer are usually, “Are you looking to do Spanish audio to Spanish text?” So same language audio and text, which would be Lexi if it was automatic. Or if you want to do translation from English to then have a second track of Spanish captions, you can do that automatically using the Lexi Translate product and that can be done then whether the original source of English captions is AI or a human transcriber. And that can be some of the stuff that's best about the Lexi Translate tool live is being able to add support for some other languages, where depending on your region it can be kind of hard. I mean, Spanish in the United States obviously not that hard but some languages it's harder to find somebody who can do the real-time transcription than others.
Matt: Alright, next question here is:
How can I caption my Facebook live streams?
Matt: Facebook Live is probably the most common ask for someone doing a live stream. It's definitely up there. It's that and probably YouTube. If you wanted to add captioning to your Facebook live streams, the simplest way to do that would be with our Falcon product where you kind of just take the source stream and instead of sending it directly to Facebook, it would be sent to Falcon first, which would then allow you to add captions and have them embedded into the stream as 608 data, which is readable by Facebook. So from there you can add captions from either a human captioner or through Lexi or any however you're gonna get your captions over iCap and yeah, so that's probably the easiest way to do a Facebook live stream.
How do I caption my Zoom meetings?
Matt: So Zoom is a little bit different. I know it has to do with - we have two different versions of Falcon. One of them specifically integrates with Zoom, the other one's more for actual Facebook live streams, but I’ll field the technical part over to Bill here, because I know that there's a difference.
Bill: Yeah, and this webinar is through Zoom, as you know, and it's probably a good case study of what you can do with captioning through Zoom using the EEG tools. So the webinar we have, we have a tech - say hi to Wes Long out there who is feeding the audio from the Zoom webinar through a Windows program that we distribute with Falcon called iCap Webcast. And the Windows program will hear the audio from the Zoom webinar and uplinks that into the iCap ecosystem, so you can use that with human captions or you can use it with Lexi. Now if you're using a human captioner, you may not even have to go that far. The human captioner can just go directly into the Zoom by listening to the webinar, but if you're using automatic you need a way to get that audio up there into the iCap ecosystem. So the iCap Webcast program does that for you and then you can turn Lexi and Falcon on and you'll be able to post the captions using a text-based protocol into Zoom. And Zoom controls the way that they're formatted and displayed, so it's a little bit different from something like Facebook where the captions are in the video as closed captions more similar to a TV or broadcasting kind of workflow. And Zoom, all the different video contributors are just sending their own video, but the audio can be retrieved from a computer of anybody listening on the panel, and at that point you can get the Lexi captions back into Zoom in through there.
Matt: Perfect, much better than I would have said it. The next question is:
Why is captioning accuracy measured so often and how is it mentioned?
Matt: I think because people really want it to be-
Bill: [unintelligible] ...it's not very useful.
Matt: I think people really want it to keep improving. Obviously, that's kind of the number one thing that people are looking for when they're looking at automatic captioning versus a human captioner, is kind of the accuracy and how well you're going to be able to read the text versus what's being said. So I think people are constantly looking for it to be better just because this is a solution that's getting better and better over time, so it's going to be measured that frequently. I don't know how it's measured, Bill.
Bill: Well, I mean, yeah, a lot of regulatory approaches kind of will, again, put video into two categories, where it's either post-produced, pre-recorded or it's live, and when something is pre-recorded, you should really be shooting for something a lot like 100% caption accuracy because you can bring this into an editing suite, you can get the timings right, you can listen to the part multiple times. If there's any difficulty understanding the audio, you can consult reference material. It can really be completely right with real-time. Whether it's human transcribers or AI, you pretty much get one chance so standards are generally lower. In a lot of countries they use something called an NER score, which is kind of a more complex version of the simplest metric, which would just be what percent of words are right, what percent of words are wrong, and NER is a little more complicated than that because it takes into account how important the words that are missed are to the meaning of the phrase or how misleading they are if they're not right. So the NER is a little more sophisticated. It takes some training to do that. There's official accreditations for doing the NER scoring. For basic uses where the question might just be, “How does a system perform on my videos?” you're not actually going to go too far wrong by just taking a sample of a couple of minutes and looking at how many words are right and how many words are wrong, and that's pretty tedious. People who do it all the time are faster at it but it's something where you can look at it like if you are for a live program, if you are at 98% or something, you should probably consider that pretty much as perfect as is going to be accomplished in most environments. Something like 95% is very good, something like 90% is you're gonna start to notice more the errors and when you're starting to float in, let's say, the 85% or below range, that's really something where you need to consider, What do we need to do to improve this? Do we need to go with a premium human service that does better? Do we need to do more vocabulary preparation? Do we need to improve the audio quality of the program that the transcriber is getting? Because the accuracy at that point, even a number like 80%, if that sounds good then it's not going to be that good. So it's a very important question and other things matter in captioning, too, like how much delay the words have, things like the final positioning and appearance of the captions, but probably your number one question about how soon is this captioning system for a certain video application, your most basic question is going to be about the accuracy for sure.
Matt: Alright. Next question is:
Is there a difference between closed captioning for streaming versus IP?
Matt: Yeah, so we have different products that kind of - I mean, it depends what type of video you're doing, really. The idea is very similar in how it's going to be embedded, but we have different products for different levels of video streaming. Like Falcon is one of our products that I would say is probably our streaming product, and then for IP we have Alta, which accepts MPEG-TS or 2110. So yeah, there's definitely a difference in how you get it embedded, but I'll let Bill answer the more technical part of that.
Bill: Right, so a streaming video goes over IP just means internet protocol so yeah, streaming over the internet it uses IP. When you see these applied to captioning products, usually IP video means kind of professional video in a studio using IP, so in the EEG product line that's Alta, which works with MPEG transport streams and SMPTE 2110 with Amazon CDI under compressed video and cloud. So this is kind of a professional video pipeline like using a broadcasting plan it's over IP now because there's been so much movement in that space away from SDI video that uses dedicated hardware to IP video carried over networking equipment and using software products like Alta to do pieces of processing. Streaming, usually someone's talking about a more enterprise, educational, even kind of personal level video where you're going to put it to YouTube, you're going to put it to Twitch, you're going to put it to Facebook, you're going to put it to a platform for businesses like Brightcove, and basically the video is originated maybe just from a laptop stream or something a studio like Wirecast or vMix, and so then we caption that. Typically we'll recommend using our Falcon product ,which is a cloud-hosted product. It does lower bandwidth videos. It's cheaper, it's easier to run with its full infrastructure as a service so you don't need to install anything on your own equipment or really own any equipment. So that kind of distinction between the more fixed broadcast studio installation and something that's the more ad hoc stream-to-consumer web platform is kind of the difference for captioning and, again, that would be Alta versus Falcon in the EEG land.
Matt: Alright. Next question is:
Why 608 and 708? Setup for closed captioning systems can be confusing.
Matt: They can be confusing, that is absolutely correct. 608 and 708, I think, are - just my understanding is they're just broadcast standards for closed captioning. Beyond that, I'm not sure why they picked the numbers. It's just one of those things that just is. Bill, do you have any more thoughts on that?
Bill: Yeah, for North America 608 was the SD standard and when they moved to digital broadcasting and HD over the air, it became 708. This has kind of stayed very relevant because still when you look at streaming captions, kind of the number one most interoperable way to put streaming video captions to a lot of platforms is to have embedded data in the 708 standard. And the reason I think a lot of people get confused about this labeling is you'll have some products that say they support 708 caption input and you'll have some platforms that say they support 608 caption input, and the question is, Is this the same thing? And from an EEG product user perspective, the answer is really yes, this is the same thing, because your embedded captions in Falcon are going to be a 708 packaging of the captioning. But inside the 708 data there's a little smaller packet chunk and that has all the 608 data. And so a lot of players and platforms just read the little chunk of 608 and they don't read the bigger chunk of 708, but really that's fine when you're talking about inter-platform interoperability because it has your text, it has your positioning, it even has things like colors in it. So really, the data that you need is in the 608 and, really, a lot of the features that came into 708 are things that have never been very widely used. There's more control over the positional, but on the other hand a lot of web players don't give any control over the end position, they just put all the captions on the bottom. So the 708 features for broadcast are not usually very important in the streaming world and so, really, anything that tells you it supports 608 or 708 captions in the streaming world, you're going to be good with the captions produced by Falcon or by another EEG product for that. It's really all the same difference.
Matt: Alright, next question is:
How can I get 608/708 into live streamed automatic speech recognition captions?
Bill: Well, I hope I'm getting this right. I mean, I'd probably frame that a little bit in the opposite workflow and as far as how can I get automatic speech recognition captions into 608/708. And really, that's the crux of our Lexi product. That's mostly what it's being used for, is generating something that's going to produce 608/708 format captions, and it highlights how in captioning you have really two different parts. And one part is, Where do the transcriptions come from? And that's a human steno, a human voice writer, automatic captioning. And then you have, How is the captioning transmitted as part of the video? And that's something like 608/708 and it's embedded in RTMP or HTTP live streaming with Falcon, that's embedded in SMPTE 2110 or an MPEG transport stream with Alta, that's in the VANC space of an SDI signal when you have a 492 hardware encoder. So really all of these would be saying, Let's get the Lexi data and put it in 608/708 captioning. And the only real choice the customer kind of needs to look at is, What's the standard of my video signal that I want the 608/708 data to go into and where is the video at the point of insertion? Is it on-prem in a bunch of SDI gear? Is it in an on-prem software installation, virtual machines and such? Is it on its way to the cloud to go to Facebook? So kind of, Where is the video physically and what codec or standard is it in? And I think that kind of gives you the answer for what kind of products you need to connect Lexi into a 608/708 workflow.
Matt: Thank you, Bill. The next question is going to be:
Is there a way to fix a word or two in the embedded 608/708 captions after you've completed the event and edited the recording?
Matt: So I think that comes down to how it's captured once the recording is done. If you can get a transcript of that file, you can bring it into post-production software and edit it and then put it along its way into the final product. So yeah, there's a way to do that as far as an EEG solution. Bill, do we have anything that does that specifically?
Bill: Yeah, if you sort of break it down to a couple of steps you would say, “OK, I have a video, the video has embedded captions. I want to change the embedded captions and then re-put out a video with the changed captions.” So you could do that relatively directly with our Scribe software, which is the post-production software, and it basically comes down to what format is the video file in. Most commonly it'll be something like an MP4 file, maybe MXF in some situations, but what file format is this in? And then Scribe will import those file formats and it will show you the text that's in there. You can make edits to the text, and then you'll re-export the video. And the video will be the same in terms of things like file size, compression type, but the captions will be changed. And because the 608/708 standard is a little bit complicated in terms of the captions need to be paced out in a very specific way, unfortunately you're really not going to get anywhere by sort of opening up the file in the text editor or something like that and just changing the words in Notepad. But using EEG software or, frankly, a lot of different vendor plug-ins–there's some plug-in support and something like Adobe Premiere that's really pretty good–you should be able to get the job done if you have a caption file.
Matt: Perfect, OK. Next question is:
I'm looking at automated caption options for closed captioning on our scoreboard. What are my options?
Matt: So I know you can do this with our SDI encoders. We have the HD492, you can add a module onto it that allows you to output text over TCP/IP, which is going to be accepted by a lot of common scoreboards like Daktronics and similar scoreboard formats. So that would be a way of doing that with an SDI option. If it's something like an LED, like TV, something that you'd be putting in HDMI, you can do the same thing with the HD492 or similar SDI encoder that we have, and convert the SDI signal into HDMI, which can then be displayed on these screens. So yeah, there's definitely options for how you can do that.
Bill: Yeah, your basic question in setup, I think, is getting a feed of audio inside the venue that has all the audio you're looking to transcribe, so that's probably going to be a combination of public address announcement and anything that comes out, like advertisements or other sources of pre-recorded material that's, let's say, played between innings of the baseball game or anything like that. And that would be the same audio feed that you would need to provide if you were working with a human captioner and they weren't in the stadium, the question would be, What do I need to transmit to this remote person so that they can hear what they need to write? And basically the same thing with Lexi, so with a 492 you would need to get either the audio track that you wanted to transcribe, either embed it into an SDI signal or you could use an XLR connector to put it in the back of the 492 as an audio-only track. And that audio is going to be encrypted, transmitted to iCap, transcribed through the Lexi Model, sent back to the encoder and then the encoder can take an IP address or a serial connection to your scoreboard system and dump that text back into the scoreboard system from there or, as Matt said, if you want to do an open caption thing, the encoder has a decoder output and so you can you can take open captions from that and feed it to a video board if you have a corresponding video presentation.
Matt: OK, next question here is:
What is Lexi’s accuracy rate? Is there a way to input words, names, etc. into Lexi before an event starts?
Matt: So two separate questions that I'm seeing there. What is Lexi's accuracy rate? We generally rate it around 95% but it's gonna vary higher or lower depending on a) how well you train it beforehand, b) how well it can hear the speaker that the feed is - the cleanest audio signal is going to produce the best captions, basically. So there's a couple of factors that are going to play into what Lexi's accuracy rate is. But it's generally somewhere in the 95-96% range. It can get higher depending on how you're using it. And is there a way to input words, names, etc. into Lexi beforehand? Yes, we have a Topic Models system, we call it, where you can kind of feed it words and names of towns nearby and anything that it might not get otherwise. You can kind of teach it beforehand and that way it's more likely when it hears that word, it's more likely to produce the right word in spelling it out.
Bill: Yeah, and if your program's simple, there may be very little training required. A more complex program with more special cases, you should definitely put some training into that and it's going to improve the result a lot. With some types of programs that, let's say, have very casual speech, a lot of background noise, a lot of crosstalk, you're just probably not going to get an awesome result from a fully AI captioning solution and so part of it is realizing that the accuracy that you're going to get is going to vary. That's true with human captioners as well, but when the accuracy you get there is a way to look at it is, When would we recommend using Lexi captioning? And we'd recommend using Lexi captioning when you're getting an accuracy that's in the mid to high 90%. If you're not going to be able to get that kind of result, it's probably just not that good an application for the technology and it probably makes sense to look at what else you can do for accessibility. The proper names and the things that are great for a Topic Model, that kind of learning and training can get you all the way on a lot of types of programs, but there can be a couple of different reasons why your program isn't a great choice for getting really shiny results on AI captioning, and some of them are very directly addressed by the Topic Models system others are less directly addressed by that. So it can really depend a lot on the programming. I mean, frankly, not to insult people but it drives me crazy when vendors in this space say that there's a single percentage number that describes what their system does and no caveats are attached to that at all, no qualifications, it's just silly.
Matt: It's all marketing. OK, next question here is:
Can Lexi work in tandem with a live human captioner if someone wants to use both for greater accuracy?
Matt: The workflow for that would be pretty tricky if my understanding of the technology is correct. They would both be using iCap to communicate the audio over to either system, so switching interchangeably between the two might require a lot of starting and stopping of the program video, so I'm actually not sure how well that would work.
Bill: Yeah, you're going to be using either Lexi or a human captioner or a source like a teleprompter or pre-scripted, pre-prepared material. You're going to be using one of those at a time within the system, so you can stagger usage of them kind of to get the best results available, but it isn't really a real-time correction-based system. Those systems are - you really need a lot of training to be able to use kind of what you'd call a two operator re-speaking correction system, which is done, especially in some global broadcast regions that can be popular. It's a pretty expensive workflow because a lot of the times it involves two highly-trained operators. So currently we're in a mode where you're using one of these sources at a time and if you're having kind of cost savings or operational benefits from staggering, then that is very easy to do. A lot of options for staggering them and calling when you're going to use the automatic captioning from the cloud and when you're going to use other sources and a lot of automatic switching, even, say, you have pre-recorded captions in the video, the 492's can do automatic switching, but they're not both operating at exactly the same time.
Matt: OK, next question here is:
I need to caption meetings that involve proprietary information. Are there any alternatives to cloud-hosted captioning that you can recommend?
Matt: So the first thought that comes to my mind is we do have an on-premises version of Lexi that comes in a 1RU unit that you would hook into your iCap encoder that would be able to create caption data without any internet connection whatsoever. So typically with Lexi on the cloud, you need an internet connection to be able to connect to iCap which then connects over to our cloud services for Lexi. But this box will allow you to do that without any internet connection, which is great. And that's kind of the best way to do the highest privacy is it not touching the cloud whatsoever, so yeah, we definitely do have an option for that and it has relatively the same specifications as our cloud version of Lexi as well. So it's kind of just up to you. If you value privacy that highly or you can't fit the iCap network into your current workflow or network requirements, then that could be a great solution for you.
Bill: Yeah, and it's worth mentioning that workflows like that also - it works with the 492 or with the Alta platforms, which are completely on-prem video handling platforms when you're using them with the Lexi Local system. The Lexi Local system is also compatible with using iCap, human captioning services when needed, and just in that case you would be responsible for keeping the IT services contained. So any third-party captioners who are connecting to it are going to need to have VPN access or be in-person or be otherwise able to access that equipment without going out to public networks because you're keeping the entire workflow contained.
Matt: So next up is:
Do your closed captioning solutions work with Wirecast?
Matt: So whenever I hear Wirecast, I immediately think of Falcon, because you would basically take your Wirecast RTMP output, send it to Falcon, and then send it to the CDN of your choice from there. So yes, Falcon is certainly what I would recommend immediately for Wirecast. If there's other solutions they might work, too, but Falcon is definitely my go-to answer there.
Bill: Yeah, that's probably the most common streaming encoder we see people use with Falcon. Your support onboarder will probably tell you this. When you're embedding captions in the output from Wirecast, use the main concept H.264 codec. Do not use the x264 codec. You heard it here first.
Matt: Next question:
Do you have plans to add grouping mode to Falcon, in addition to hardware encoders?
Matt: So I'm not sure what grouping mode is exactly.
Bill: Probably the desire is to send these same captions to a set of screens multiple Falcons together on an Access Code. And yeah, you can definitely do that during a Falcon. What you have to do procedurally to do that with your Falcons is you have to share your Falcons into an iCap Admin account because that gives you - within Falcon, you can basically just create within that interface for Falcon on eegcloud.tv. You can create a single Access Code or an Access Code for each language you want to do, and that just targets that one Falcon. If you want to mix Falcons and 492’s or have multiple Falcons that are grouped together, you can do that and you just need to click the box on the site that says, basically, “Share this into my iCap Admin account,” provide your company credentials, and then you'll see those Falcon encoders appear in the iCap Accounts page the same as you would with 492s, for anyone that's familiar with using those, and at that point you can make your own groupings and you can make as many combinations of them as you want. So for more complicated workflows or, for example, let's say you want to do regional redundancy with Falcon–because we recently added a couple of new regions to Falcon–you can now operate the streams out of the east coast of the United States, the west coast of the United States, or London or Sydney. And so you can do regional redundancy on the streams and group them together this way.
Matt: Next question is:
Is there any latency introduced putting Falcon in my RTMP path?
Matt: My understanding is that it's extremely minimal. The actual latency that's added and passing through Falcon to your video stream, it's very, very minimal. I usually quote under a second.
Bill: Yeah, I mean, it might be about a second. It really depends on the keyframe rate you set in the video. A typical setting is to have a keyframe rate of about once a second, which will make your pass-through delay on Falcon about one second. Using more frequent keyframes will bring that delay down only because the captions are encoded on that basis. They need to be encoded between the two keyframes, which are basically video frames that have the entire picture worth of information as opposed to being differential coding in the MPEG standards. So if you want less latency, you can originate a slightly higher bandwidth stream that has more keyframes in it and that'll bring down your latency while keeping the captions in good shape.
Matt: Next question here is:
What platforms (other than Zoom) offer ease of streaming real-time captions through their platforms? What is the threshold or capacity for different language CART options that can stream easily through an encoder?
Matt: So if we're talking about platforms in relation to conferencing platforms like Zoom, there isn't too much that you can do to add third-party CART services within these conferencing platforms other than Zoom. I think Zoom is kind of, like, the one exception. Most of your other ones like Webex and Teams and things like that generally don't actually allow for third-party services to tap in there.
Bill: What you can do with Webex and Teams, I believe, is you can do kind of like a type in when you're working with a premium human captioner. What's tougher is if you want to do something that's software-based and API-automated, because that basically winds up just being a window where you can type text into. It's kind of like a glorified chat window but made for closed captions, so it's a good solution if you're using a human captioner, but yeah, the support–even on Zoom, which is honestly one of the better ones–but the support is unfortunately not so strong on most of these meeting platforms for doing something like multiple languages or using a - you can't really add your own - something like Google Meet has automatic captions built in. The quality of the automatic captions is good out of the box, but there's really nothing you can do to add your own vocabulary into it. It just is what it is, so there are some limitations. A lot of times if you want to really make a more kind of classic, professional-quality, multi-language closed caption capability out of a meeting, it makes more sense to treat it as a video stream, to take something like this Zoom meeting and put it through a software video switch or something like a Wirecast or vMix or a platform like that. And at that point you can treat it more as a single video source and add a lot of different captions and people can watch it that way rather than through the conferencing software, but with different languages and with a more robust closed caption option.
Matt: Next question is:
What captioning solutions would you recommend for meetings that feature multiple languages?
Matt: So, I mean, I guess it depends on kind of your workflow and if you're doing live streaming versus if this is a broadcast meeting. I suppose I would recommend different things, but in general we do have a couple options that support multiple language captioning. Falcon supports either four or six languages depending on your output per in one stream. Our broadcast options like the SDI encoders offer I think up to four different caption tracks in one video feed.
Bill: Right, the possibility is that the captioning is that the question is referring to a single conversation that switches rapidly between languages, so that’s a tough one.
Matt: OK yeah, so that's going to be tough. I mean, as far as automatic captioning, I don't know of any solution right now. I think Lexi, you kind of set the language before the program starts, but if there's a human captioner out there that can do that, I'm sure they charge a pretty penny for their translation services on the fly.
Bill: There's unfortunately so few individuals that are going to be able to real-time type a re-speak in multiple languages kind of switching on a dime, so that's going to be a difficult service to contract out of just one person. So, I mean, if you come in and ask our service bureau about that, you can probably get some help from a team. But yeah, it is gonna be a little bit of a difficult thing to support from an AI perspective. I know a meeting platform like the Google Meet platform, it doesn't ask you what language you're gonna speak ahead of time. It does auto-detect the language, but I have never really seen it do anything amazing in terms of being able to switch which language it thinks you're speaking on a kind of rapid basis, so it might depend a lot on how controlled the flow is in terms of code switches.
Matt: And to clarify, too, what I was talking about at first was related to translation, so if you had all the same program audio translating to several languages you could do multiple tracks of captions in different languages, but this is a different subject. It's a little bit harder to kind of have an immediate solution for, at least. OK, so the next question here is:
Does EEG offer automatic captioning solutions for post-production videos? If so, where can I find more information on that service?
Matt: Yeah, so this is where our Scribe software is going to come into play, like Bill, I think we were talking about this before where you can ingest your videos in a post-production setting. There's a Lexi feature built into Scribe where you kind of just hit the Lexi button and it takes a few minutes and it transcribes the video and then it'll pop up the captions in a timeline. You can edit them and export the caption file on its own to upload to wherever.
Bill: Right, so in a self-service way, Scribe’s a good workflow for that works with Lexi automation in the cloud. We're also really in the process of building out some processes with our Ai-Media partners where they have a lot of expertise in this and can offer a pretty full menu of services. So that's going to be compatible with EEG equipment and we can grab the captions from live things that are done with any of the iCap products to help you out with making that kind of a full workflow if it's something like a fix-up of previously live material.
Matt: OK, next question is:
How can I build a software-only solution to live stream an event containing a mix of live captioning and pre-recorded, pre-captioned video?
Matt: So that's a little bit tricky just because of the pre-captioned part kind of mixing in a live stream environment. It's gonna be tough to actually stream out some video with pre-captioned content on it sometimes. And then you need live captioning as well adds another layer of complexity to that because there's confusion with how you're gonna start and stop the captions that are being created live. So I think it's definitely a tricky workflow as well for live streaming. Not too great of a solution at the moment for the pre-recorded, pre-captioned content.
Bill: So yeah, and the Falcon or Alta products will take pre-recorded captions on their input and pass it through and you can punch in live captioning in only the areas where it's not previously captioned. So the problem isn't really on that end, and I think your biggest challenge might be to find something that's going to take - the question would be, What format are the pre-recorded, pre-captioned videos in and how are you going to get those files? How are you going to spool them out while preserving the captions? And I think that's something that could be a challenge with more if you're looking at kind of more lightweight solutions. Some things like Wirecast does that. There's a lot of things in the broadcast domain, of course, like broadcast layout servers that will take an MXF file with captions and stream that out in a lot of different real-time formats. But looking to do something that I sort of would be assuming is looking to do that in a more lightweight space, I don't actually have a software recommendation for that, so that's something we'd be interested in hearing about that, actually, if somebody does have a recommendation.
Matt: Next question is:
I'm interested in learning how to integrate a live human captioner with our workflow. We use Wirecast and the IBM Cloud Video platform for streaming university events.
Matt: So again, this is similar to the question before related to Wirecast where this is kind of where Falcon would come into play. You would just send your RTMP feed from Wirecast to Falcon. Falcon would use iCap to connect with the live human captioner. They would listen to the audio, create live caption text. It's sent back down over iCap into Falcon, and then from Falcon you can route it to the IBM Cloud Video platform pretty seamlessly without. That's definitely a common workflow we've seen before. OK, next question is:
I'm a producer for a small, non-profit, Milwaukee, Wisconsin-based television station. We're getting into the production game as well as on-demand streaming sector. I want a cheap way of captioning the content we are creating, and a way to live caption our streaming.
Matt: So if this is a television station, it sounds like you're doing a mix of broadcast and streaming in this workflow, so I'd probably recommend one of our SDI encoders because then you could take the same video feed, feed it into the SDI encoder, create live captions either through an automatic solution or through a human captioner, and then send that out and split it out to however it's going to go. So from there you could send it to an RTMP encoder for the live stream, then you can send the other signal out to your broadcast signal. So that's why I would recommend a hardware solution, is because then you keep that SDI feed and you kind of just split it out as you need it from there. OK, our next question is:
Are there plans to support SRT and/or RSIT inputs and outputs as an alternative to RTMP? Both of these protocols would provide greater reliability and quicker recovery during internet glitches.
Matt: I don't know about that. That's going to be a Bill question.
Bill: Definitely, yeah. So where we're currently doing that is actually working on the Alta product and working through when the customers have - we have both hosted and self-hosted options of that, but if you connect the Alta product through something like Amazon MediaConnect, which is kind of a gateway that supports both of these protocols, that also supports Zixi, it supports CDI, and that's the main way we've been servicing these workflows so far. I think a lot of people might be interested in seeing that in the Falcon space, which is an RTMP-centric product. Now, I mean, the protocols on protocol level are not very similar to RTMP, probably mostly for the better, but we definitely are interested in putting that into Falcon. There were some questions in my mind about how many other platforms the Falcon outputs to actually support SRT and RIST, and one of the reasons why we've been handling that out of the Alta product, which is a little bit more high-end pro-oriented. Because, for example, with Falcon, in order to support more languages of captioning we recently put out a new update that allows Falcon to produce an HLS stream that has up to six languages of captions. Any language with VTT tracks works really well, but you need to be able to ingest that HLS platform, that HLS stream into a platform. And one of the things we've seen is that, like for all that RTMP is a very outdated protocol, it's still the protocol that 90% of the platforms want you to ingest in. So I think as the industry matures on that we're certainly planning on considering a look at the options for Falcon. If you have a right-now workflow on this, I'd recommend looking into the Alta approach and we can kind of guide you through how to make that work.
Matt: Next question here is:
Can an EEG encoder convert existing broadcast or live streaming captions to IMSC 1.0 TTML captions for ATSC 3.0 broadcast?
Matt: We're getting to the more technical questions here, so again, it's definitely a real question.
Bill: So yeah, I mean, background: ATSC 3.0 is the new over-the-air broadcast standard. It's rolled out in a lot of markets in the United States. We're talking about an antenna-based over-the-air broadcast TV. You can do ultra HD in it. There's a lot of cool things you can do with it. It does have a new caption format attached to it, which is based on the TTML standards. In the implementations that I've been involved in with ATSC 3.0, it's actually the conversion software that goes from the MPEG-2 ATSC 1.0 stream into ATSC 3.0 that handles caption conversion. So in other words, you can use the same SDI caption encoder from EEG without needing to make a change to that. If you are doing UHD native content, you can actually use the new 650 encoder product, which has 12 Gbps 4K-compatible input and output and that'll allow you to do native UHD captioning. The captioning still comes out in a VANC packet that has 608/708 data, but you should be able to push that through an ATSC 3.0 workflow as long as the software–kind of a multiplexer and playlist software-on that supports caption conversion. And as these workflows roll out a little bit more, maybe there will be some different configurations of third-party products to work through on that. It'll be kind of exciting if ATSC 3.0 starts getting off the ground a bit more for details like that.
Matt: OK, next question is:
When a blanking (clearing) caption is inserted between roll-up captions, does it matter if the blanking caption is one frame long or if it's many frames long?
Matt: That's, again, another - I don't have a good answer for that one.
Bill: Yeah, so it sounds like the question is from someone really performing real-time captioning and kind of asking about some of the technical ways, like OK, what commands do you press to erase screen to get rid of your captions when you want to clear and start again? The Erase command, when you send it once, should be erasing the screen. And really, that’s that. So there shouldn't really be any change of behavior or any need to change the behavior in order to send an Erase command multiple times. Like I would be sort of concerned that if there's a use case where sending erases multiple times is creating a different effect, it kind of sounds like there could be something else wrong in the system that's causing that, because erasing it once should get rid of it.
Matt: OK, so it looks like this is the last question for today's webinar, and that is:
A 2020 study found translational imbalances between AI accuracy for white speakers and black speakers. Overall, AI misunderstood 35% of the words spoken by Blacks but only 19% spoken by Whites. What can the AI industry do overall to transcribe all speakers equitably?
Matt: So that's a great question. That's actually one that EEG isn't - I don't think we directly have a hand in controlling how the speakers are recognized based on race, especially, so that's going to be - Lexi is configured to work with different backends and that's who kind of controls things like this. But we do have Lexi configured in a way that we can switch between whichever ones are making greater improvements in the area, and this is one that's definitely really important and if these numbers are true. But that's definitely something that could be worked upon and will be worked upon, I'm sure, by the various backends that we work with.
Bill: Yeah and, I mean, if you look at these numbers, this is saying that 19% of the words spoken by Whites were inaccurate in this sample, which means that only 80% of them were correct, which means that for the purposes of a professional captioning product that you're paying for, that's really not acceptable performance at all. So I've seen some studies like this and it's definitely an interesting subject with a lot of implications. I think the implications might be more relevant for some things that are consumer-facing technology that's kind of used by everyone to do something like transcribe their voicemail or do Siri on their phone. And clearly that's used by everyone and it needs to support a really broad range of accents and usage patterns and sound environments. For Lexi, we're looking at usually pretty controlled environments, like something like a TV newscast or corporate speaker series presentations and the target accuracy is going to be much higher than either of the numbers that are kind of quoted in this question. So I think it kind of speaks to a general problem where the model needs to be trained on data that's going to match what's going to be used. And maybe that's less of a problem in an environment where you have pretty controlled speaker sets–hopefully diverse speaker sets–but people reading from a script and not really in a variety of colloquial environments. And I think, again, these numbers are not really good or what we'd be seeking with Lexi for any group of people.
Matt: And that is it for today's webinar.
Regina: Yeah, thank you so much, Matt and Bill. So yeah, we have reached the end of our live Q&A webinar. I'd like to thank all of the attendees for joining us today and submitting your great questions. And a big thank you to Matt and Bill for leading this event, plus Wes Long for delivering captions behind the scenes. We received so many questions, so if we didn't answer yours, we will follow up with you soon. And if you have any questions about EEG or any of the topics we discussed today, please reach out to our sales team at firstname.lastname@example.org. And within the next few days, everybody who signed up for this webinar will receive an email with the link to the recording as soon as it's available. So thank you all again and have a great rest of your week!
Bill: Next month's presentation is Truth or Dare. That's the next format.
Regina: Constantly switching it up.
Bill: Bye bye.
Matt: Thanks, everybody.