My wife has a beautiful pink iPod shuffle which she loves to listen music from, on her daily commute. She got bored with the songs and wanted a new playlist. I was entrusted with downloading new songs. Yep, all pirated. Now, downloading songs these days is a really boring stuff given the amount of bandwidth and apps like Spotify. So, I decided to automate the whole affair.
I was downloading from BengaliMp3Downloads. After some messing around with their html, I wrote this quick script to download the files from the site.
import os, urlparse from bs4 import BeautifulSoup import requests start_url = 'https://bengalimp3downloads.com/Anjan_Dutta~songs_by_artists-opencontents-1-1.html' domain = 'https://bengalimp3downloads.com' def download_files_from(url): response = requests.get(url) try: doc = BeautifulSoup(response.text, 'html.parser') for link in doc.find_all('div', style="vertical-align:bottom;"): for li in link.find_all("a"): download_page_url = domain + li['href'] res = requests.get(download_page_url) soup = BeautifulSoup(res.text, 'html.parser') for l in soup.find_all('source'): print 'MP3 URL is ' + l['src'] a = urlparse.urlparse(l['src']) filename = os.path.basename(a.path) print 'Saving to file ' + filename + '...' d = requests.get(l['src']) with open(filename, 'wb') as f: f.write(d.content) f.close() for nxt in doc.find_all(text='[Next]'): print 'Next page found. Continuing...' nxt_pg_link = domain + nxt.parent.attrs['href'] print nxt_pg_link download_files_from(nxt_pg_link) except: pass download_files_from(start_url)
It’s not something to show off but felt like writing about it and enjoying the extra time while the files are being downloaded!
P.S. Always use VirtualEnv!