Sound Topics - Music and Computers - Advanced Speech Processing at Columbia University

I’m currently taking Microsoft’s DEV287x on Speech Recognition Systems on edX.org. We are supposedly going to build a speech recognition system from scratch in Python over the next several months. In the course, there are frequent references as to how human speech and hearing work and discussion of the need to process the speech signal by computer akin to the way humans hear what we have to say to each other. For instance, applying a mel filterbank that favors lower frequencies and then taking the log of that because our hearing processes sound for important frequencies in a similar way. But working on module 02 of the course, audio signal processing, I came across some fascinating references on Columbia University web pages and beyond.

One site is a free online version of an old book on Music and Computers that apparently didn’t sell well enough to continue in a print edition. The purpose of the book is to both teach the basics about sound and music and at the same time teach you to analyze music via a computer.

The book was written as a collaboration between the Columbia University Music Department and the Mathematics and Music Departments of Dartmouth College. In a foreward by a Dartmouth College math professor, she says a point of the book was to teach, using sound, that math can not only be very powerful but also a lot of fun-instead of just teaching math on very dry topics. In sidebars, the book has downloadable samples of music, applets to illustrate and play with music that require Java and an ActiveX control(not to secure for Windows unless you run a virtual machine you can trash afterwards), and also links to further information within the book and on the Internet (although because of the age of the book and the aging of the Internet, some of the links generated <404> Page Not Found! errors).

http://sites.music.columbia.edu/cmc/MusicAndComputers/

The second Columbia University website is an active graduate-level course on Topics in Spoken Language Processing.

http://www.cs.columbia.edu/~julia/courses/CS6998-2019/AdvTopicsSpeechProcessing19.html

It covers a wide-range of topics from how we make speech, how computers can process the sound in speech, to analyzing human emotions/behaviors such as deception and trust, humor and sarcasm, likability and charisma. A list of downloaded course materials is here (downloading anything over the Internet, especially via http://, not https://, can be insecure):

http://www.cs.columbia.edu/~julia/courses/CS6998-2019/

Of particular interest to me since we have a grandchild with a speech impediment problem is the book chapter on phonetics and a link to a manual for a free, very powerful open-source piece of software that provides graphic images to analyze phoneme pronunciations (haven’t tried the software itself yet to see if it still works-it appears to be an old program).

http://www.cs.columbia.edu/~julia/courses/CS6998-2019/[07]%20Phonetics.pdf


Edit_Update: Direct link does not seem to work from hearingtracker.com but the reference is the following link in the index of course materials:
image
2nd Update: From the Praat manual source at UCSD http://wstyler.ucsd.edu/praat/UsingPraatforLinguisticResearchLatest.pdf

Apart from these two Columbia University sites, there is a highly cited book on Speech and Language Processing by Daniel Jurafsky and James H. Martin, 3rd edition available for free in draft form from Stanford University.

https://web.stanford.edu/~jurafsky/slp3/

and also highly cited on the Internet as reference material on speech recognition, there is the University of Edinburgh (Harvard of Scotland) course on Automatic Speech Recognition, which cites Jurafsky’s book as the essential text for its course:

http://www.inf.ed.ac.uk/teaching/courses/asr/lectures-2020.html

So although these references may seem “off-topic” for a hearing website, when you consider that a number of folks here extoll the benefits of Google’s Live Transcribe, the above science is the foundation upon which Live Transcribe is built - and a really good transcription system for the hard-of-hearing would also want to capture emotion: sarcasm and humor, joy and sadness, deception and trust, etc., and textually inform the user of shades of meaning the user has trouble hearing naturally…

BTW, if anyone is familiar with Python and wants to give audio processing a try in a very simple way, the following is an excellent short tutorial in Python code on how to generate an artificial sound signal by combining two sine waves, window the signal, and then do a Fast Fourier Transformation(FFT) to reveal the frequencies that compose your artificially made-up signal. The nice thing about the example is that if you want to convince yourself it really works, you can add an additional sine wave or two and see that it appears in the graph of frequency analysis exactly where expected. For example, adding “+ np.sin(100.0 * 2.0*np.pi*x)” to the definition of the variable “y” creates a third frequency of 100 equal in magnitude to the first frequency at 50 and twice the frequency at 80. BTW, the example works perfectly in Anaconda Python 3.7 using an interactive window in VS Code even though Python 2.7 is cited as the language used.
(comment out the line “y = np.array(Bx)” and uncomment the preceding line beginning “y = np.sin(…”)

1 Like

Not long ago there was an internet story about a porn site advertising for a Python programmer. Someone responded that they were interested in the job, but would be embarrassed to tell their family what they did.

The site responded “We understand. We’d be embarrassed to tell people we coded Python too.”

1 Like

I can also code in C# (and to a certain degree in Javascript and HTML/.CSS) if that helps redeem my reputation-I’ve managed to forget most of what I ever knew about C++ and C… But Python, depending on what site you consult, is a very popular programming language (amongst the top few). Java, especially because of Android, is about at the peak, and Apple’s new development language, Swift, although not very popular yet, is in a large part modeled on Python but lacks the extensive open-source libraries that Python has-and all the popular help and tutorials you find on the Internet for Python. Python is easy to learn, can be run in interactive mode (easy to test code fragments), you don’t have a bunch of braces and semicolons to worry about or complicated language syntax (although Python can be so terse it’s ~incomprehensible). Python has fantastic scientific, mathematical, statistical, and machine-learning FREE libraries and is the ~language of choice for Machine Learning - some of the most established books in machine learning use Python for all their code (and things that need to run fast can be written and compiled in C and run pedal-to-the-metal).

So I think Francois Chollet, Google’s inventor of Keras, a high level and very powerful interface for TensorFlow(Google’s neural net framwork), who wrote a very popular book called Deep Learning with Python, would not find the opinion well-informed that Python is a language to be shunned. But I think I heard that Google, just like Apple with Swift, is working on a language that picks up where Python leaves off.

But as a coding environment for anyone who wants to give it a spin, I highly recommend Anaconda’s version of Python 3.7. You can code within a notebook environment where textual commentary on what you’re doing can be interweaved with interactive code fragments (it is possible to run all the code in a notebook at once, too) and the output of the your code, text, number tables and graphics can be preserved in the notebook in beautiful presentation form. Anaconda works on Windows, Linux, and MAC and there are plenty of free tutorials on the Internet on beginning to code in Python and beyond. And even better, you can run a Juypter notebook page within Visual Studio Code (on Mac, Linux, or Windows) and have access to the very powerful Intellisense help and debugging facilities that Microsoft provides in Code as well as the power of running a local Juypter server within Code. It’s so easy to work with even I could do it.


(Anaconda Python will eat about 2 Gb of disk space)

For example, here’s what it takes in Python to print the beginner’s classic announcement (courtesy of Kernighan & Ritchie): - (the > >>> is the Python interpreter Command prompt)

>>> print(‘Hello, World!’)

whereas here’s about what it takes in C#:

using System;

namespace HelloWorldApp {

   class Geeks { 
      
      static void Main(string[] args) { 
          
         Console.WriteLine("Hello World!"); 
          
        // To prevents the screen from  
        // running and closing quickly 
        Console.ReadKey(); 
    } 
} 

}

I think for anyone who just wants to start learning something about programming, the best choice is clear. One can learn about type safety later (and I’ve already had some good lessons, thanks to Python!) but the dynamic typing of Python also makes coding easier for beginners…

An intro course from Google:

An intro course from Microsoft (one of several, this one might assume a little background):

From annual IEEE data analysis- related to skills that actually get people high-paying jobs:

https://spectrum.ieee.org/computing/software/the-top-programming-languages-2019