Leeching Music from YouTube for Fun, Learning, and Profit

by Synystr

Disclaimer:  Downloading copyrighted music is illegal, blah, blah, blah.  You guys already know this.  Let's begin, shall we?

YouTube has become one of the biggest resources to find music on the Internet these days.  Which is odd, since it started as a video-sharing community.  This becomes more apparent as time goes on in this age of social media, as people continue to post music videos they like on Facebook and other communities to share with friends and family.  Recording artists and labels have even begun to do this themselves in the form of lyric videos and preview clips, harnessing the power of sharing through the Internet to get their product out there and noticed.

I listen to a lot of chiptunes and ambient music, two less-than-mainstream genres of music.  You could argue that they are getting more popular due to the advent of social media and sharing, but for a while, it was hard to find anything of the sort.  YouTube has made that easier.  Whether it is live performances, one- or two-hour mixes, remixes, covers, etc., you can find pretty much anything now, and YouTube is a great starting point in looking for it.

It didn't take long for people to figure out how to strip the music from these videos and save them as MP3 files so they could burn them to CD and listen to them any time they wanted to.  Various websites have popped up that allow you to simply copy and paste a URL to a YouTube video, click a button, and download it as a MP3, allowing for an easy method in gaining new music.

I used these sites for a while, but I soon found myself tiring of the various pop-up ads, flashing "CLICK HERE!" buttons, bandwidth limitations, etc.  Some of them didn't even work properly.  Most sites I found were just trying to make a quick buck off of everyday computer users who just wanted their music.  Thankfully, I found a solution in youtube-dl, a public-domain application in which you could download music from YouTube, SoundCloud, and other sites, using the same method of URL-pasting, only without all of the annoying ads.

youtube-dl was a life-saver for me, and when I found out about the batch-download option, it was even better.  However, I still had the task of encoding the files to MP3 manually, as youtube-dl just downloads the file as its native MP4 format.  Enter FFmpeg - an open-source program that can convert video files from one format to another, including MP3 audio.  With this, I could download the videos with youtube-dl, then encode them with FFmpeg.  It was a pretty nice setup.

Still, I soon found that it was not enough.  While this method was a lot more efficient than dealing with the crap-infested websites I previously had to endure, it seemed like it was less efficient than it could be.  Doing one thing in one program, then doing a second thing in another, all to achieve one result - it seemed like the process could be simplified somehow.

During all of this, I was teaching myself Python 2.7 as a hobby.  I hadn't coded in forever, and I felt like Python was the best way to whet my appetite and ease my way back into programming.  At some point, it clicked - who's to say that I can't write a Python script that glues these two programs together cleanly and produce the same result with minimal effort?

I would have coded my own standalone app in C or another language - that would have been the cooler, more respectful option - but I wasn't (and still am not) that experienced yet, so what's the next best thing?

Take various already-existing resources and glue them together to make them work the way you want!  Hackers do this all the time, so I figured it was the natural solution to my conundrum.

Thus, I began writing a script to download MP3 from YouTube.  Eventually, I had a full-fledged script that, when executed, simply asked me to enter URL after URL of YouTube videos until I pressed Enter, and then the script did all the work for me.  I eventually even added in the option to burn the downloaded compilation directly to CD-R, which is really cool when I need a mix-CD for long trips in the car.

I will now walk you through how to achieve this yourself.

Note:  This script utilizes system calls to the Bash shell on a Linux machine, which is what I was mainly using when I wrote this script.  As such, this exact script will only work on Linux.  However, it is simple enough where you can easily modify it to any other OS you are using, Windows included.

Here is the first block of code.

This is not required (other than the import statement, which definitely is required), but it makes maintaining the file structure and youtube-dl/FFmpeg binaries a little easier.

#********** LIBRARY IMPORTS **********
#os for system calls, time for delays so user can read output

import os, time

#********** INSTALLATION AND UPDATES **********
#This script utilizes ffmpeg, youtube-dl and cdrdao

print("Checking for youtube-dl and FFMpeg...")
time.sleep(3)

os.system("cd /usr/local/bin")
if not os.path.exists('/usr/local/bin/youtube-dl'):
      print("youtube-dl is not installed. Installing now.")
      time.sleep(3)
      os.system("sudo wget https://yt-dl.org/downloads/2014.05.12/youtube-dl -O /usr/local/bin/youtube-dl")
      os.system("sudo chmod a+x /usr/local/bin/youtube-dl")
      os.system("sudo chmod rwx /usr/local/bin/youtube-dl")
      print("youtube-dl has been installed.")
      print("Now updating youtube-dl...")
      os.system("sudo /usr/local/bin/youtube-dl -U")
else:
      print("Checking for update to youtube-dl...")
      os.system("sudo /usr/local/bin/youtube-dl -U")

if not os.path.exists('/usr/local/bin/ffmpeg'):
      print("FFMpeg is not installed. Installing now.")
      time.sleep(3)
      os.system("sudo wget http://ffmpeg.gusari.org/static/32bit/ffmpeg.static.32bit.latest.tar.gz -O /usr/local/bin/ffmpeg.tar.gz")
      os.system("sudo tar -zxvf /usr/local/bin/*.tar.gz -C /usr/local/bin")
      os.system("sudo chmod a+x /usr/local/bin/ffmpeg")
      os.system("sudo chmod a+x /usr/local/bin/ffprobe")
      os.system("sudo rm ffmpeg.tar.gz")
      print("FFMpeg has been installed.")
else:
      print("FFMpeg is already installed.")

print("Installing/Updating CDRDAO through apt-get. This is for burning to CD-R.  Install manually if you do not use apt-get and wish to burn CDs with this program instead of an external one.")
time.sleep(5)
os.system("sudo apt-get install cdrdao")
os.system("clear")

First, we import the OS and time libraries, OS for system calls and time to insert a delay between operations.  It makes the output easier to read.

Next, we check to see if the youtube-dl binary exists in the /usr/local/bin directory.  If it does, the program moves on.  If not, it downloads a fresh copy of the binary to this location.  In both cases, youtube-dl is also updated to the latest version using the built-in -U option, as sometimes YouTube can change their encryption algorithms and render youtube-dl largely useless until it is updated.  We then do the same thing with the FFmpeg binary, to the same location.

CDRDAO is then downloaded and installed using APT.  I put a warning in to compile from source if the user is using a non-Debian distro and wants to have CD-burning work.

#********** DOWNLOADING VIDEOS/CONVERTING TO MP3 **********

urls = []
currenturl = "1"
while currenturl != "":
  currenturl = raw_input('Enter URL (just hit Enter to stop and begin downloading): ')
  if currenturl == "":
      break
  urls.append(currenturl)

print ("done with queue entry. Downloading videos from YouTube:")
time.sleep(3)

count = 1
for i in urls:
  if count <= 9:
      os.system("/usr/local/bin/youtube-dl " + i + " -o 'Track_0" + str(count) + "-_%(title)s.%(ext)s' --restrict-filenames")
  else:
      os.system("/usr/local/bin/youtube-dl " + i + " -o 'Track_" + str(count) + "-_%(title)s.%(ext)s' --restrict-filenames")
  count = count + 1

print ("Finished downloading queue. Finding downloaded videos: ")

downloaded = []
for file in os.listdir('.'):
  if file.endswith(".mp4"):
      print file
      downloaded.append(file)
      print ("Here are the found files: ")
print '[%s]' % ', '.join(map(str, downloaded))

print ("Now converting videos: ")
time.sleep(3)
downloaded.sort()
for x in downloaded:
  os.system('/usr/local/bin/ffmpeg -i ' + x + " " + x + '.mp3')

print ("Finished converting. Cleaning up: ")
time.sleep(3)

for file in os.listdir('.'):
  if file.endswith(".mp4"):
      print ("Deleting file " + file + "...")
      os.system("rm " + file)

The first part of this section is an infinite loop which asks us for a YouTube URL with each iteration, which we then paste in.  The URL is then appended to a Python list and kept track of.  If no input is entered and we simply press Enter when it asks for a URL, the loop breaks, and we move on.

After the loop breaks (Enter being pressed with no input), another loop begins, with one iteration per URL we entered.  Each iteration calls youtube-dl, stored in /usr/local/bin where we downloaded it earlier, along with a custom formatting option (this can be changed however you see fit - consult youtube-dl's documentation for more options) and also the option --restrict-filenames.  This option is required, as problems can arise with formatting due to YouTube files containing spaces and Linux/Bash truncating the filenames because of this.  As you can see, an if/else statement is coded in, appending a 0 before the track number if the variable "count" is less than or equal to 9, and taking the 0 away if not.  This is to conform to a naming convention that will allow burning to CD without messing up the order of the tracks.

The program then lists all of the files it downloaded, complete with extensions.  This part is not required to get functionality out of the program, but I added it in while debugging the script so I could tell if it was working correctly, and I liked it so I kept it in.  Feel free to remove it if you feel otherwise.

After this, a third loop is executed, one iteration per MP4 file downloaded.  This time, it calls FFmpeg, also in /usr/local/bin where we downloaded it earlier.  The call to FFmpeg takes the MP4 files that youtube-dl downloaded and converts them into an MP3 with the same name.  (The .mp4 is still retained in the final filename, but I was too lazy to code around that.)  Finally, the script deletes all MP4 files, as we no longer need them.

Shortly after this article was accepted, I used my script to get some more music, and ran into some issues with the name formatting I explained above (adding in track names and such).  After some research, I found that my script updated to a new version of the youtube-dl program that it utilizes, as it is intended to do, but the new version, for some reason, switches the order of the -o option and the URL to download.

I was able to remedy this by modifying the applicable section of code above to:

count = 1
for i in urls:
   if count <= 9:
       os.system("/usr/local/bin/youtube-dl -o 'Track_0" + str(count) + "_-_%(title)s.%(ext)s' --restrict-filenames " + i)
   else:
       os.system("/usr/local/bin/youtube-dl -o 'Track_" + str(count) + "_-_%(title)s.%(ext)s' --restrict-filenames " + i)
   count = count + 1

This basically is just switching the order of the -o option and the URL to download.  I am not sure why this change occurred; I was unable to find a changelog for the program.  I am unsure if this is a bug in the youtube-dl program, or an intended feature/syntactical change.

#********** BURNING TO CD-R **********

switch = raw_input("Would you like to burn the downloaded MP3 to CD-R? 'y' for yes or anything else for no:")

if switch == "y":

  for file in os.listdir('.'):
      if file.endswith(".mp3"):
          os.system("/usr/local/bin/ffmpeg -i " + file + " " + file + ".wav")

  wave = []

  for file in os.listdir('.'):
      if file.endswith(".wav"):
          wave.append(file)
  wave.sort()


  os.system("touch cd.toc")
  os.system("sudo chmod 777 cd.toc")

  f = open('cd.toc','w')
  f.write('CD_DA\n\n')

  for z in wave:
      f.write('\n\nTRACK AUDIO\n')
      f.write('AUDIOFILE "' + z + '" 0')
  f.close()
  raw_input ("Please place a blank CD-R into your CD drive, then hit Enter:")
  print ("Now burning CD...")

  os.system("cdrdao write cd.toc")

  for y in wave:
      print ("Deleting file " + y + "...")
      os.system("rm " + y)
  os.system("rm cd.toc")

else:
  print ("Skipping CD burning.")

The burning part of the script begins by asking if they want to burn a CD or not.  If so, a loop begins encoding all downloaded MP3 files back into WAV format, as this format is required for CDRDAO.  If they don't want to burn, this entire block is skipped.

A new Python list is created and filled with all of the new WAV files that were just encoded, and then we use the sort() method to sort them by track name for burning.

After sorting, we create a new file called cd.toc, which is the table of contents file for CDRDAO, used to tell the program what to burn and in what order.  This has to be formatted a certain way, so we first add the CD_DA part at the top of the file, then two line breaks, and then we use a for loop to write the data required for each track.

After the cd.toc file is created, the program asks the user to put in a blank CD-R and press Enter.  When this is done, CDRDAO is finally called, inputting the cd.toc file we generated earlier as an argument.  The CD burns.

After the CD is burned, we remove all the WAVs, as they are no longer needed, as well as the cd.toc file.  We then move on.

#********** POST-OPERATION ORGANIZATION **********

name = raw_input("Give a name to the compilation you've made:")
name = name.replace(" ", "_")
os.system("mkdir " + name)
os.system("mv *.mp3 " + name)
print("Moved MP3 into a folder called " + name + ".")
print ("All finished. Enjoy! Hit Enter to terminate program.")
raw_input("")

This final part is optional but recommended.  Since the script runs and writes to the current working directory, I made this block of code for organization purposes.  First, it asks for a "compilation name," which the user can name any way they want.  I like to name them after genre type.

This name is then converted to a format where underscores replace blank spaces, and then a new folder is created with the name the user types, and all MP3 files are moved into this new folder.

At this point, just hit Enter to end the program!

Sure, it was a quick, dirty, and noobish Python script, but it works just fine.  I was able to figure out how to automate two programs into completing one task with some Python grease.  And because of this, I am now even more eager to learn as much as I can about programming, and I encourage anyone who is reading this, no matter what skill level you are at, to take a look at it yourself if this article piqued your interest.  Even if you don't know how to code, try learning it.  Pick a language (I'd recommend Python, it's doing wonders for me), use Google, and teach yourself.  You'd be surprised at what you can accomplish.

That's the cool thing about this.  Sometimes, you get an idea, and even if you can't create something entirely from scratch, if you have the resources, or at least the knowledge to find said resources, you can still make something that works the way that you want it to.

Feel free to use, modify, and distribute this script in any way you see fit.  I already am doing so myself.  I plan on adding in GUI and porting it to Windows.

Rock on, everyone!

Sources

Code: youtube-get.py

Return to $2600 Index