colore Posted July 8, 2007 Posted July 8, 2007 hello 1) is there a way to grab and store (in a txt file) all the "links" or "urls in the text" of all the webpages I visit, that contain a specific string eg urls like www.*.com/*.pdf ? the program must scan the text and links of all the webpages I visit and if it finds an url of the above mask, it should store it (in a text file) 2) I would like a program that will store (in a text file) the urls of the webpages I visit that match a specific mask eg www.google.com/* thanks!
DigeratiPrime Posted July 9, 2007 Posted July 9, 2007 I dont know of a program that can do that, i have an idea though.Basically convert the Firefox file history.dat into a txt file, import the list into a spreadsheet app like Excel and do the filtering in that. Easier said than done though.Currently Firefox 2.x uses the "mork" format to store its history, this will be converted to SQLite in Firefox 3.x which you can download alpha releases of.To convert the 'mork' history.dat into txt see this page, there is a bookmarklet (which did not work for me) and program called 'Dork' in the comments that DID work for me.http://philwilson.org/blog/2005/01/how-to-...ry-to-text.htmlYou may want to try the Firefox 3 alphas because it should theoretically be very easy to do it with this, you just need an sqlite parser.http://www.squarefree.com/burningedge/For Internet Explorer here is an article, did not work for me though:http://mcpmag.com/columns/article.asp?EditorialsID=1595hope this helps a little, if you find something better let me know.
colore Posted July 10, 2007 Author Posted July 10, 2007 I am working on JS which imo would be the best solution
cleveaires Posted July 10, 2007 Posted July 10, 2007 maybe we could make a customize program for that
colore Posted July 11, 2007 Author Posted July 11, 2007 I am looking forward for your suggestionthanks
DigeratiPrime Posted July 11, 2007 Posted July 11, 2007 at that page I liked to at philwilson.org it states there is a javascript available that converts the Mork history.dat into RDF/XML at this bugzilla page. https://bugzilla.mozilla.org/show_bug.cgi?id=241438In fact at that page there are 3 attachments, one for javascript, one for python, and one to convert to tab delimited.
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now