Some time back, I wanted to parse a web page to get some images from it (as part of my album art fixer), instead of writing a full blown c# application for it I decided I'll use do some QnD hack using linux shell commands (I use unxutils on windows). Retrieving the web page was easy using wget, I'd already done it for retrieving "A Brief History on Time". The problem was that I was only interested in parsing the file till some particular text, i.e. I wanted to ignore nearly half of the file. I tried sed but couldn't find anything working, so ended up doing it the long way by using grep, cut and head, something like:
<snip>
set var=
fgrep -i -n "ignore from here onwards" file.htm | cut -d: -f1 > rm.txt
set /p var=<rm.txt
head -%var% file.htm
</snip>
Does anyone know of a better way of doing this?
No comments:
Post a Comment