YouTube’s Automatic Closed-Captioning of Mathematical Speech (Part 2)

Last semester, as I spend untold hours editing the closed captioning automatically generated by YouTube on the math videos on my YouTube channel, I got a crash course on the capabilities and limitations of this system. This crash course was perhaps not legally necessary but extra work that I took on because a student with a hearing impairment was enrolled in my class, and I wanted to ensure that the review videos that I provide to my students were accessible to him also.

I think the resources offered by my university are fairly typical to ensure that instructors are able to reach all students and not just those who don’t have audio/visual impairments. After discussions with the cognizant people at my university, I’ve made a few conclusions:

  • Mostly by accident, my videos are ADA compliant since I made the decision to both write out the solutions and also talking through the solutions.
  • While the automatic closed-captioning provided by YouTube may be minimally compliant with ADA, I’m not sure that a student with a hearing impairment could always follow the transcriptions due to a number of errors.
  • Aside from punctuation, capitalization, and the occasional homonym (e.g., right vs. write), YouTube does a pretty good job at transcribing ordinary speech.
  • Naturally, YouTube’s automated closed-captioning is not to blame when I don’t enunciate clearly, have a rabbit trail of thought but then have to backtrack, use poor grammar, make a outright mistake, etc.
  • However, YouTube seems to have a lot of difficulty providing automatic closed-captioning of mathematical speech.

Fixing these transcription errors took an awful lot of time. I don’t want to know how many hours I devoted to fixing the 120 or so videos (each video is about 3-10 minutes long) recorded so that my hearing-impaired student could have full access to my class. About halfway into this project of fixing the closed-captioning errors, I started writing down some of the closed-captioning errors. I wish I had thought to do this near the start, but oh well.

Phonetically, I can understand why most of these errors were made. But these mistakes really shouldn’t have happened. Here are my favorite howlers that I recorded, showing both what I said and what YouTube thought I said.

  • “931,147,496” became “930 1,000,000 147,000 496”
  • A \cap C,” pronounced “A intersect C,” became “A inner sexy”
  • “arithmetic” became “rhythm sick”
  • “capital X” became “Catholics”
  • “cardinality” became “carnality”
  • “divisible by 5” became “visited his wife live” (I have no idea how that happened)
  • e^x” became “eat ooh the x”
  • “for succinctness” became “force the sickness”
  • n \choose n,” pronounced “n choose n,” became “and shoes and”
  • “set containing” became “second taining”
  • \sqrt{2}” became “squirt tuna”
  • “two ways in” became “too wasted”
  • “what f(3),” pronounced “what f of 3,” became “whateva 3”
  • x \in B, pronounced “x is in B,” became “sexism be”
  • x \in B \cap C, pronounced “x is in B and C,” became “x is Indiana see”
  • x \in C, pronounced “x is in C,” became “excellency”

Here’s the complete list of howlers that I recorded for posterity. If I’ve learned nothing else, it’s that I need to be more proactive about ensuring the mathematical accuracy of closed-captioning for my YouTube videos.

4 for
857 a 50 7
1232 1230 two
4761 4760 1
19,999 19,000 999
46,376 40 6376
123,552 120 3,552
5,565,120 five million 565,000 120
931,147,496 930 1,000,000 147,000 496
(2,\emptyset) 2d sent
(20,8) 28
[1,2] one too
12 \choose 4 12 juice 4
16 \choose 8 16 choosing
3 + 1 = 4 surplus one mix for
4 \choose 0 4 2 0
4 \choose k four twos k
49 \choose 5 49 she’s 5
50 \choose 6 52 six
8 \choose 2 a choose to
A \cap C A inner sexy
A \cap D a intersecting
A \cup B a you be
A \cup C a UNC
A \cup C a you will see
a proof approved
A^c a compliment
a_i asa by
all multiples of almost visit
an element of A known the debate
an element of A normal today
and divisible and as above
and positive 50 + + 50
and tens intense
and would let this be 3 andrew lippa p3
arithmetic earth to
arithmetic rhythm sick
As ace
B but not C be but not si
B \cap C b in a sexy
B if beef
bijection bi CH action
bijection bite jection
bijection by dejection
bijection by ejection
bijection by jection
bijection by Junction
both sets both says
capital X Catholics
cardinality carnality
Cartesian car to shull
codomain code Amin
coordinate cordon
coordinate court
coordinates corners
coordinates have cort in sap
cosine cosign
disjoint destroyed
divisible by 5 visited his wife live
e^x eat ooh the x
element of A illness of A
element of A mellow today
element x that Windex
elements of us
empty MQ
\emptyset descent
\emptyset intercept
equal able
exponent x1
factored acted
factorial fact welders
fill in film
flipping four coins philippine for coins
for succinctness force the sickness
hence in Hanson
i eye
i aye
If I divide by 15 If I / 15
in A nae
in there a bear
infinite if an
infinite imp an
infinite infant
into five in 2 5
is ice
j \choose r j choose arms
kth cave
kth kate
likewise lakh wise
n \choose n and shoes and
nth row nth throw
one-to-one 121
onto on 2
r \choose r our shoes are
r to art at
r to already
\mathbb{R}^2 are too
\mathbb{R}^2 our too
r‘s hours
same row samro
second coordinate sec cornered
set containing second inning
set containing second taining
set containing seconds hanging
set containing secretary
set containing 1 second anyone
since A has say has
sixth one six-month
square swear
\sqrt{2} score 2
\sqrt{2} squirt of tuna
team A teammate
term in it terminate
than zero gloves are off
that’s chosen that’s Showzen
then x the next
therefore there for
this entry in the century plus
to the k decay
two are to are
two ways in too wasted
union you need
up here pier
what f(3) whateva 3
will be 4 will before
with n=4 finials 4
would subtract was attract
writing riding
x is extras
x is in exiting
x is in A x as a native
x is in A x is nay
x is in B sexism be
x is in B and C x is Indiana see
x is in C excellency
x is in C X’s and see
x_2 next to
x_2 text too
x-coordinate export
y why
y wine
y is greater than or wider
ys wise

