On May 14, 2020, EEG Video hosted a webinar to educate media professionals about our full range of closed captioning innovations.
New Products Showcase • May 14, 2020
During this event, John Voorheis, Director of Sales for EEG, explored the very latest solutions and features from EEG. Topics we covered included:
An overview of EEG’s closed captioning and subtitling solutions
How you can use our products for your captioning needs and workflows
The latest closed captioning advancements at EEG
A live Q&A session
Featured EEG solutions included:
To find out about upcoming EEG webinars, as well as all other previously streamed installments, visit here!
John: Alright everybody, looks like it’s 2:00 PM so we can go ahead and get started. Just let me know if you guys have trouble hearing me. Looks like we have our Lexi captions in action. So thanks everybody so much for joining us for our New Products Showcase and today we're going to go through a lot of the stuff that we would typically go through in ordinary times at NAB.
So, you know, just a little bit of background today for those of you who are not - who may not be too familiar with EEG, we've been a company for nearly 40 years now. We were actually founded as Electrical Engineering Group in the early 1980s and we helped develop the North American 608 closed caption standard in conjunction with PBS. So it's definitely - you know, closed captioning has been part of our DNA for quite some time and we're really happy to be still providing accessibility solutions in 2020 as they become more and more important.
HD492 iCap Encoder
So here I'm gonna take us through. You know, kind of our flagship product right now for SDI closed caption encoder is our HD492 iCap encoder. It’s a dual input/output encoder, so it has two SDI inputs and two SDI outputs, as well as a decode output for displaying open captions. It can be really useful, those two input/outputs if you're doing multi-language captioning.
So, you can use something like our iCap Translate product in that second output to translate captions, you know, from your first language from English to another language, like Spanish, for example. And this encoder is fully compatible with, you know, iCap captioners, as well as Lexi and all our caption partners.
The iCap encoder, it features a number of add-on modules as well. You know, you can - there's a module for CCMatch where you can delay the video. We have options for SCTE-104 insertion, closed caption playback, CC delay matching as I - as I mentioned with our CCMatch modules.
You can actually use this decoder to delay video by up to 10 seconds. And kind of the big news, the big product update for the 492 is the 4K version that we’ll be releasing this fall that will be compatible with 12G SDI 4K video.
Also has a 2110-40 module which is pretty big news in that it's future-proofed for IP video workflows. You can actually inject back data into a parallel 2110 workflow, which can be useful, you know, if you're building out - you’re building out a workflow in SDI, but you know you want to be future-proofed, you know, for that eventual move to 2110.
Alta Software Caption Encoder
And speaking of 2110, that leads us to our Alta software caption encoder, which is a closed caption encoder for IP video environments. You know, again, this is - you know, as the slide says, this is primarily a broadcast product for cable networks, OTT networks, and this is just our virtualized encoder to work with MPEG Transport Stream and 2110.
You know, we've got quite a few of these out in the wild as it is and we're starting to see a lot more interest in 2110 as that move’s starting to occur. So very exciting, you know, to see that technology become more commonplace and, with it, Alta.
Alta supports number formats, including HEVC, DVB subtitles, SMPTE 2038/ASPEN. Also supports SCTE-35 trigger insertion as well so, you know, just really the advanced form of our HD492 for the next generation video format.
We also have, you know, focusing a little bit more on the 2110 as opposed to the Transport Stream, we have the Alta Network Gateway - just a 1RU live closed caption system that accommodates up to 10 channels of SMPTE 2110-30 audio and 2110-40 ancillary data simultaneously. Also supports NMOS Management API.
So definitely happy to answer any questions you guys might have about Alta, either for MPEG Transport Stream or 2110. Bill McLaughlin, our VP of Product here at EEG is also present on the webinar. He works extremely closely with Alta, so he's another great resource for any questions you may have about your move 2110 or MPEG Transport Stream.
Moving on, here's our Falcon live streaming RTMP closed caption encoder, which is really one of EEG's coolest products and also something that we've seen, you know, kind of as we've moved into this new state of normality, you know, we've seen a lot of interest in this and it's really exciting because it really enables accessibility for all sorts of virtual events without the addition of any hardware.
Now, as you can imagine, if so many things that, you know used to be in person, such as this - you know, such as this very webinar, you know, are moving to a virtual world, you know, there are going to be new accessibility needs. And kind of one of the challenges I've heard a lot in my day-to-day interaction with people looking for captions is not being able to make it into an office or, you know, into a facility to use equipment. So Falcon really can be useful in that regard, too, in that it’s a fully virtual product; it's an RTMP input/output closed caption encoder.
We're seeing it used for, you know, a lot of universities, really any streaming events, and how it works is you just take the output from your streaming media encoder. You point that to Falcon using an iCap connection. Falcon either sends the audio to a closed caption writer, a CART transcriptionist, or to Lexi service. The caption data is returned to Falcon in the cloud and you have an RTMP output that you can point to multiple CDNs using one channel of Falcon.
It’s very scalable as well, you know, it's available as an annual license, as well as a monthly license, so if you have multiple video streams you might be working with, you can spin those up - you can spin those up using additional Falcon licenses as necessary without being locked in for a whole year, so it can really scale with you as your captioning needs scale.
We have an HLS output expansion scheduled for 2020 Q3, you know, right around IBC time. Also have a direct HLS output with VTT or TTML caption tracks and we're working on improving support for non-European language character sets so, you know, think your Cantonese, your Mandarin, your Korean, languages like that which previously we’d only supported Latin character sets with Falcon. So that’s some very exciting news there as well.
Lexi Automatic Captioning
Now moving on Falcon as we've been talking quite a bit about, Lexi is our automatic captioning service that leverages speech recognition technology to provide automatic captioning. You know, this technology, it’s really exciting in that it's enabled a lot of people to provide closed captions in situations where it may not have been practical to, you know, or might not have been affordable to do so.
You know, right now we state the accuracy of our Lexi service at about 90% which is out of the box and we offer, you know, what we call Topic Models to improve that and to program Lexi with specific vernacular and vocabulary that might be relevant to the content that you're captioning. And with those custom Topic Models in place, you know, we've seen accuracy well upwards of 95%.
You know, we really think of Lexi at EEG as being an excellent supplement to live closed captioners. It's really great because, you know, unlike when you work with a human being where you have to schedule this person in advance, with Lexi, you just have to - you know, you just have to activate it through the user interface and you're all set to caption. So it works great in situations where, like, breaking news, for example, where there may be very short notice and you might need a captioner or, you know, just for captioning streams that you might not otherwise be able to afford to do so.
You know, specifically in the case of higher education for right now, you know, you may want captions for every class that’s gone online, but it may not make sense to retain a live captioner, you know, for captioning into an entire semester worth of courses. So, you know, there’s definitely a lot of really great applications for the Lexi service. And we're really excited about watching technology as it continues to develop and as it continues to, you know, work to provide accessibility alongside live captioners.
You know, some developments with Lexi are, you know, we’ve released additional ASR models for higher accuracy, lower latency. You know, we have some preset models that we offer with the service for news, other events that we’re continuing to broaden those.
We've also, you know, a really big development since the last NAB is Lexi Local, which is a 1RU unit similar in form to our closed caption encoders for that we turn Lexi Local that provides automatic captioning without cloud connectivity. This was specifically designed kind of for the three-letter folks in Washington. And, you know, other corporate applications, content that isn't really designed for the public where, you know, there may be a no-cloud policy just because it's, you know, extremely sensitive like, you know, a corporate - like the minutes of a corporate or, you know, maybe a new product release, something like that.
But the license model can be very favorable to broadcasters who are producing a lot - a significant amount of live closed caption content. It’s really an unlimited license in that we don't track usage and it's not billed by the hour/by the minute, but it, you know, the Lexi Local price point, it's an annual - it's an annual fee that includes unlimited closed captioning.
So with that, you know, there is no metered uses, so if you're in a situation where you’re captioning 24/7 or if there’s a significant amount of usage, you know, Lexi Local and the license model might be beneficial to you from a cost perspective. So, you know, if you think that might be the case, you know, I invite you to discuss that with me or another one - another member of our sales team. I can definitely kind of tell you at what point it might make sense for, like, to use Lexi Local from a cost perspective over something - something like the cloud-based version.
And here's a nice graphic that we’ve generated here illustrating the differences between Lexi Local and Lexi. And you see really the primary difference is with regular - the ordinary cloud Lexi, you can integrate that with human captioning and that - they can exist side by side. You can detect the presence of human captioners, where of course that wouldn't be possible with Lexi Local since you're going to be in a lockdown environment without - without connectivity to human captioners with an internet connection. It's also not virtual and it does have that fixed annual cost.
And Regina says there’s a question: Does Lexi Local include an integrated closed caption encoder? And no, it does not but, again, that's something that we can kind of discuss how that works if you want to contact email@example.com. Definitely can kind of talk about the options for using a closed caption encoder, you know, with Lexi Local and how that will impact the overall cost of that device.
The AV610 CaptionPort - you know, again, this is a product we’ve seen a lot of increased interest in with this, you know, new normal, so to speak. It’s fully compatible with iCap, uses an iCap connection. It's also, you know - it's also compatible with Lexi. You know, as I mentioned, you can also configure it for use over RS-232, Telnet, and it can be configured to include optional modem.
What the AV610 does is it's not a caption encoder in that it displays - it displays open captions only and, as you can see on this graphic, you can scale the video so that if you're giving a presentation, much as I am today, you can scale it so, you know, on the presentation today you see these captions at the bottom of the screen but, you know, and fortunately we have - we have a very good marketing person putting together excellent slides so nothing is obstructed by these captions, you know, but often times presentations there’s a lot to view; you have, you know, slides with like infographics, histograms, that sort of thing where captions might obstruct those.
With the CaptionPort you can scale - you can scale the video so nothing's obstructed by the closed captions. And again, you know, it can be very good in a venue setting where you have, you know, a display in the front of the room, and you don’t want people in the back to miss some things in the presentation because of the captions. So it's definitely, you know, extremely useful in education, corporate settings, as well as in-venue events when those ultimately return, so that's a very exciting product.
There’s going to be a 4K version shipping this summer in 2020 that can overlay captions in native resolution for 4K video with a 12G SDI connection and, again, as I said, it’s designed for large-screen live events.
Wow, and it looks like at this point we've almost covered everything. Went a little bit quicker than I expected. I didn't mean to run through this so quickly for everybody, but definitely interested in hearing any questions that you guys have. So, you know, definitely want to have this be conversational.
Can the AV610 do a full page of captioning? We have something with a separate screen for open captioning. Bill, do you know the answer to that? I'm not sure I fully understand the question.
Bill: Yeah, the - basically, the question revolves around the video generation feature and we are - in the latest release there’s some new features surrounding this, so it's great to mention. You actually do not have to put an input video into the product at all. You can upload through the web page a static image in a bunch of different formats–either a JPEG or an SVG vector graphic–and you can actually display that along with the scrolling text in kind of an extra large font that goes to about half screen worth of large text and the other part is occupied by the static image, so it's kind of perfect for that type of situation where you have an event where you don’t have a separate magnification video that you're trying to put on the TV, you're just trying to use the display to show captions kind of in as readable a way as possible.
John: Excellent, excellent. And question 3: Can Alta encoding support local closed caption encoding, say, from a teleprompter feed?
Bill: Yes, there's a Telnet input. Each channel that you set up on the Alta server will have a separate port for plain telnet input from a local device, like a prompter, and yeah, that works the same way it would with an SDI caption encoder.
John: Can Lexi Local feed multiple closed caption encoders at the same time? Not simultaneously, no. It's one-to-one simultaneously. If you're running simultaneous, you can't caption multiple video feeds at once with Lexi Local, but Bill can kind of elaborate on how that works in terms of supporting multiple closed caption encoders at once.
Bill: Yeah, you can connect multiple encoders and, actually, you can license multiple simultaneous streams on a single physical Lexi Local server. So the question from the point of view of installation is yes. From the point of view of licensing, each one requires a separate license.
John: Question 5: Are there any plans or considerations to make a portable non-rack unit of the 492? We do offer the HD1492, which is an openGear card for the 492, but I don't imagine that's the question that's being asked. You're asking is there kind of like a portable 492 that you can take around? I'm not aware of any plans to develop that although we do occasionally get that same question. Bill, do we have any plans to kind of make a more portable version of 492 at this point?
Bill: Nothing that's really being released at this time, no.
John: It's one of those things that we get really kind of occasional questions about, but it's never been something that we've gotten a huge interest in, so it's something that's, you know, we're keeping an eye on and it’s still under discussion but, unfortunately, I don't think we have any concrete plans to productize, you know, kind of a portable HD492 at this time. Can Falcon be used with platforms like Twitch and OBS for captioning?
Bill: I mean, yes, that's exactly - that's exactly what it does. I mean, OBS would be a popular open source streaming encoder, and basically from OBS you can put out an RTMP feed to Falcon. You put that to Falcon and then from Falcon you can send to - you know, Twitch is one of the many destination platforms that can support the embedded captions that Falcon will put on the RTMP stream.
John: Yeah, I mean, I think the best way to think of Falcon–and Bill, please correct me if I'm wrong–is anything that accepts an RTMP stream can work with Falcon, so long as the web player has a caption decoder which, you know, of course if it has a CC button, it will work with Falcon provided the CDN accepts an RTMP stream. And Bill, do you have anything to add there?
Bill: No, I mean, absolutely, as you said, I mean, anything that supports RTMP-embedded captions, we’re good to go. I mean, certainly Twitch is a popular one and that's been proven many times.
John: You say Lexi is 90-95% accurate. How do you measure accuracy using any particular model, for example, NER? How does Lexi handle traditionally unsuitable video content: crosstalk, significant background noise, music, etc.? You know, how to - how to measure accuracy in speech recognition, you know that's - that's kind of a tricky question as to, you know, what time frame are you looking at? And Bill, when we came to 90%, you know, what time frame were we looking at for Lexi?
Bill: Yeah, that's - the quoted numbers are more of - I'm familiar with the NER methodology, which is a methodology where you will judge the accuracy of a transcript based not only on how many words are substituted, but also based on, you know, not every word is counted the same. You receive - you know, the score kind of purports to be out of a hundred, but it's not really a percentage if that makes sense. It's really a score out of 100 and the score, it penalizes errors more when those errors damage the meaning of the sentence or phrase a lot and punishes errors less when the errors are kind of minimal and don't damage the meaning of the phrase that much.
And that might sound very subjective, but there's a relatively well-formed scoring system around this that's more objective than it would sound at first blush. And some countries–not the United States–but, for example, Canada and the UK have adopted standards for live broadcasting that reference NER in what they ask broadcasters to do.
We found as a rule of thumb that, you know, we generally quote in a more simple word error rate. We've generally found that NER scores for most Lexi transcripts are a little bit higher than straight word error rate scores since many of the errors are ones that NER would judge to be sort of less severe than average. But the numbers look similar but, you know, as I said, are not directly comparable.
But, so, typically we’ll do something like a 5- or 10-minute sample of a given content and compute the simpler word error rate score since that's just something that is easier for more people to do quickly and I think gives you - gives you the right impression of where you're going for most types of content.
And, you know, re: the second part of that question, you know, clearly your Lexi accuracy is going to vary a lot from - you know, generally you could receive 95% or higher accuracy when you have a presentation where, you know, a polished speaker who’s easy to understand speaks, who speaks in a measured pace; perhaps, you know, a news anchor reading a teleprompter talks about, you know, sort of very well-known subjects that are not full of jargon words and, you know, any - it’s certainly possible to use Lexi on a much broader range of content than that, but if you consider what the simplest case is, essentially everything you add on top of that is maybe a small added degree of difficulty that is going to impact the results a bit.
And, you know, we do a lot of research and work with a lot of partners on trying to improve all of these issues: issues with crosstalk and multiple speakers, issues with background noise and music. You know, these are all active areas, but it's an added degree of difficulty and that's typically why it's simply not possible to cite a single number in any measurement system that Lexi or any other product is going to meet for every single piece of content, so it's always an active situation. I think with a customer that has a particular kind of content, building up the vocabulary Topic Models and building up the confidence that, you know, it's suitable for a specific form of content or what can be done if the initial results are less than hoped. So, you know, it’s - different content has different results, for sure. I mean, that’s not - you know, that’s a problem that's managed rather than solved, let’s say.
John: Yeah, I would just go on to say that, you know, I would say we manage it very well with the custom Topic Models in the sense that we really empower users to make it, you know, almost as good as they like in a lot of senses and that, you know, I think there is, you know, to some extent really a direct correlation or not to - but there is a direct correlation to, you know, how much time you put into the custom Topic Models and the results you get. It’s really quite–
Bill: Well, the Topic Models are very good for proper names. They’re very good for jargon-y phrases about a specific subject. If the problem you’re fighting is, you know, is something more like crosstalk or loud background sounds, you know, the impact - just, you know, that’s kind of a different problem and that often needs to be addressed by trying to figure out whether it’s possible to get a cleaner and more isolated dialogue might, you know, that can be a different set of problems than word recognition problems.
John: Sure, sure, absolutely. And those are problems that, you know, a live captioner would have - those are issues a live captioner would also have difficulty with, but the difference for–
Bill: I mean, as humans we have a tremendous ability to kinda ignore what doesn't suit us to hear in a moment, but yes, it certainly is a degree of difficulty factor for human captioners as well.
John: Okay, and I think we kind of covered this, but there is - there's two more questions here. What is the success versus error rate? For example, in our video production classes we have a diverse student body with various accents. Can the encoder discern such?
Bill: Yeah, and the Lexi system - you know, we work with several different models that are trained on a pretty wide variety of accents, you know, including really global models that are designed around the idea that many business English speakers are not English-as-a-first-language speakers.
But accents is one more thing that we talk about a degree of difficulty that, you know, if someone - you know, a good rule of thumb would be to ask, you know - you know, as a listener, do I perceive that the person I'm listening to has a strong accent? Do I ever have trouble understanding them? And, you know, that is often a good lens into whether the automatic captioning is also likely to have problems.
John: Sure, sure. And what is the success - oh, how much delay does Falcon add?
Bill: The usual answer is about one second. The technical details on that would be that it really has a lot to do with the your keyframe interval on the live RTMP that you are uplinking. A typical value will be something like, you know, two keyframes or one keyframe per second, in which case your delay through Falcon’s going to be about a second. The delay is going to be a minimum of that keyframe interval, so for that reason we typically will recommend that, you know, when using Falcon you're conscious of the video keyframe interval. And that's true of a lot of live streaming that is not Falcon as well but, you know, Falcon is one more thing in the chain, so it's one more reason to think about trying to allocate enough bandwidth to have pretty regular keyframes.
John: Alright. Well hey, thanks so much everybody for attending EEG’s New Products Showcase today here. It was great to see you all virtually, for you to join us for this 30 minutes. Thanks so much everybody and have a great day. Happy captioning!