Saturday, July 25, 2009

youtube, download, python

Well, I was actually too lazy to install some firefox plugins (whatsoever), and wanted just to have some invocable wget-type cli script at my hand to do the job:

  1. get html from youtube by initial url,
  2. parse html and work out a proper url to obtain video stream
  3. download the flv stream into desired file
  4. and, do some nice printing, maybe even a progress bar

So this is pretty amazing - it took me like less then 15 minutes to google, read, download, patch and that is it. Indeed, I am truly underestimating the power of "pythonic" python: any average user aka software developer is capable of doing miracles with the tool.




#!/usr/bin/env python2.5
# encoding: utf-8

import urlgrabber, re
import urlgrabber.progress
import sys, string

def get_video_url(url):
video_id = re.compile("\?v=([^&]*)").findall(url)[0]
a = urlgrabber.urlread(url)
param = re.compile("watch_fullscreen\?([^\']*)\'\;").findall(a)
if len(param) == 0:
print 'video params are not found'
sys.exit()
params = param[0].split('&') # break url params
for param in params:
if param[0:2] == 't=':
param = param[2:]
break
return "http://youtube.com/get_video.php?video_id=" + video_id \
+ "&t=" + param

#===============================================================================
# cli
#===============================================================================
if len(sys.argv) == 3:
url,file_name = sys.argv[1:]
else:
url = raw_input("Enter the URL: ")
file_name = raw_input("Enter a filename: ")

url = get_video_url(url)

#===============================================================================
# ouput header
#===============================================================================
hr = '*'*80 +"\n"
msg = hr + ('* %s: %s' % (string.replace(('Save to file').zfill(14),'0',' '),
file_name)) + "\n"
msg += ('* %s: %s' % (string.replace(('Video Url').zfill(14),'0',' '),
url)) + "\n"
print msg

#===============================================================================
# download
#===============================================================================
prog = urlgrabber.progress.text_progress_meter()
urlgrabber.urlgrab(url, file_name, progress_obj=prog)


First of all, big thanks for python community (that actually shares and provides all these interesting results for googling activities). Well, the urlgrabber module also seems to be a nice piece of work.

As for the code - self explanatory, if one ignore the get_video_url primitive.

No comments:

Post a Comment