Fun with Text-to-Speech

by Nestor

I just purchased the 2600 Hacker Digest, Volume 35.  This time I decided on the EPUB format because I have noticed that using Calibre, EPUB converts rather nicely into text.  After downloading it, I promptly converted the EPUB into plain old UTF-8 text and started reading.  Very soon it occurred to me that I was running late and must stop reading.  It was getting into the afternoon and I had other things I needed to catch up on.  What was I to do?  I was just settling into reading and now I had to stop.  Fooey!

Well, as a little experiment, I decided to try rendering portions of the file as text-to-speech.  I have a little pet project hosted on GitHub , which is a rehabilitated version of a public domain speech synthesizer named Pico TTS, which I lovingly renamed to NanoTTS.  My whole contribution really is that I took the Pico TTS code, which was not functioning when I found it, and made it into a functioning command line tool with sensible commands and options, coupled with a few different choices for outputs.

NanoTTS supports six different voice synthesis modules: en-US, en-GB, de-DE, es-ES, fr-FR, it-IT, as well as allowing several different options which affect inflection, such as dialing in the speed of the reader and the pitch of their voice.  I tried this in the past to varying degrees of success, but thought this time I would attempt it once more so I could keep reading 2600, even though I was busy doing things.  And wouldn't you know it - it worked wonderfully this time.

My first decision was, instead of converting the entire 2600 digest into a single, huge audio file, I decided to carve out little chunks of the file - one article at a time - and convert each of those into WAV files.  Surprisingly, in relatively short time - and using Bash no less - I was able to generate MP3s for the entire digest.  What surprised me most of all is that the output is surprisingly listenable.  I think I actually can understand everything the reader is saying.  This is no small feat, given how wooden and awful synthesized voices often sound.  And this one isn't the greatest either.

For anyone who wants to convert the entire 2600 Hacker Digest, Volume 35 into nicely labeled audio snippets, you need four things:

1.)  2600 Hacker Digest, Volume 35 in EPUB format.

2.)  Calibre - Which you use to convert it into TXT format.  Make sure UTF-8 is selected in the output options!  This is the only option I checked.  I left the others alone.  For instance, don't turn on Heuristic processing.)

3.)  NanoTTS - Which you can get from GitHub at github.com/gmn/nanotts

4.)  LAME MP3 encoder.

Please note that I have actually added this script to the NanoTTS GitHub repository in its entirety.  If your EPUB converts producing identical TXT output as mine, you should be able to run the script out of the box without altering anything.  It will generate the entire set of audio files in the current directory.

The code works simply by taking a list of line numbers.  The line numbers come in pairs: the first is the line to start on, the second is the line to end on, both are inclusive.  You can check this by opening the text file and verifying a few visually.  If the first couple match, there's a good chance they all will.  But in order to be really sure, here is the SHA-256 of the text file:
eee2f06df21436fdb374935fc7fd2d1e8384c9afe4c6f5e1c3cbf0e8efdd1ae
We merely iterate through the line-pairs and run NanoTTS for each snippet, generating an MP3 file, and voilà, we can turn an entire magazine into actuall y listenable audio for those busy folks on the go who might have to drive somewhere, or mow the lawn like me.

Enjoy!
#!/bin/bash
# Convert the entire digest issue of 2600 volume 35 into audio files for easy listening!

# I have found these settings considerably improve the legibility of the nanotts output; ymmv
#speed="0.8"
speed="0.78"
voice="en-US"
volume="0.6"
pitch="1.14"

# your file location and name will vary, obviously.
FILENAME="2600_The_Hacker_Digest-Volume_35.txt"

# Even though it may not look like it, these numbers are in pairs;
# Each pair is a starting and ending line (inclusive) of a section of text
SECTIONS="204 224 230 264 270 328 334 462 468 594 600 648 654 782 788 892 898 988 994 1006 1012 1048 1054 1124 1130 1160 1166 1194 1202 1222 1228 1254
    1260 1300 1306 1374 1380 1542 1548 1582 1588 1632 1638 1652 1658 1688 1694 1748 1754 1832 1838 1862 1868 1950 1956 1990 1998 2126 2132 2164
    2170 2236 2242 2312 2318 2342 2348 2384 2390 2452 2458 2568 2576 2630 2636 2676 2682 2698 2704 2840 2846 2874 2880 2906 2912 3046 3052 3112
    3120 3144 3150 3172 3178 3204 3210 3300 3306 3472 3478 3518 3524 3548 3554 3602 3608 3634 3640 3678 3684 3740 3746 3884 3890 3926 3932 3972
    3978 4000 4006 4128 4134 4166 4172 4244 4250 4272 4278 4316 4322 4360 4366 4426 4432 4462 4472 5192 5200 5230 5236 5322 5328 5354 5360 5480
    5486 5504 5510 5536 5542 5564 5570 5592 5598 5610 5616 5630 5636 5656 5662 5806 5814 5832 5838 5880 5886 5908 5914 5930 5936 6046 6052 6776
    6780 7438 7442 8088 8092 8796 8882 8898 9692 9754 "

COUNT=1
HEAD=''
TAIL=''

function run_nanotts() {
    local count=$3
    local title="$4"
    while [ ${#count} -lt 3 ]; do count=0$count; done
    local file="$count-2600_vol.35-$title.mp3"
    echo "nanotts --speed $speed --volume $volume --pitch $pitch --voice $voice < <( head -$1 \"${FILENAME}\" | tail -$2; echo " . . . . . . " ) -c | lame -r -s 16 -m m -V 0 -b 56 --ta \"2600 Magazine\" --tl \"2600 Vol. 35\" --tn $count - \"$file\"" >/dev/stderr
          nanotts --speed $speed --volume $volume --pitch $pitch --voice $voice < <( head -$1 "${FILENAME}" | tail -$2; echo " . . . . . . " ) -c | lame -r -s 16 -m m -V 0 -b 56 --ta "2600 Magazine" --tl "2600 Vol. 35" --tn $count - "$file"
}

for sect in ${SECTIONS}; do
    echo $sect >/dev/stderr
    if [ -z "$TAIL" ]; then
        TAIL=$sect
    else
        HEAD=$sect
        let TAIL="$HEAD-$TAIL+1"

        echo "head -$HEAD "$FILENAME" | tail -$TAIL" >/dev/stderr
              head -$HEAD "$FILENAME" | tail -$TAIL
        echo >/dev/stderr;

        TITLE=`head -$HEAD "$FILENAME" | tail -$TAIL | head -1`

        run_nanotts $HEAD $TAIL ${COUNT} "${TITLE}"

        HEAD=''; TAIL=''
        let COUNT="$COUNT+1"

        sleep 3
    fi
done
Code: Text2Speech.sh

Return to $2600 Index