Jump to content
Strawberry Orange Banana Lime Leaf Slate Sky Blueberry Grape Watermelon Chocolate Marble
Strawberry Orange Banana Lime Leaf Slate Sky Blueberry Grape Watermelon Chocolate Marble

MSFN is made available via donations, subscriptions and advertising revenue. The use of ad-blocking software hurts the site. Please disable ad-blocking software or set an exception for MSFN. Alternatively, register and become a site sponsor/subscriber and ads will be disabled automatically. 


Sign in to follow this  
DosCode

renaming files in CMD scripts

Recommended Posts

Edit: forget it then. My solution (now removed) made too much sense I guess i.e. parsing the main page for its 5 child pages' URLs, parsing the child pages for the PDFs in them, and then downloading the PDFs while saving with the proper names in the first place (using VBScript as a language, ServerXMLHTTP to make HTTP requests, and Regular Expressions to extract information from the HTML).

Have fun stringing hunks of archaic 1980's technology (cryptic batch files) together with non built-in tools only to end up with the same results, but via many unnecessary steps and over-complications, adding external dependencies in the process, while steering clear of anything remotely modern if that's your thing.

Share this post


Link to post
Share on other sites

I don't download the document manually, I have script for CMD and wget. I also rename the files and I work with all informations from the site with CMD. I simply want batch file to be the main execution file. The files that I have in my PC are renamed by script. So when I downloaded EK_GEN_0_1_en.pdf I renamed it to 0_1_en.pdf, When I downloaded gen_4_2.pdf I renamed it to 4_2.pdf

If you have Windows, which supports a plethora of modern scripting technologies, then why would you pick an archaic 1980's MS-DOS (i.e. pre-Windows) solution? There is far better and more modern stuff built-in, like vbscript, jscript and now powershell as well (and even compilers for C# & VB on recent versions of Windows).

I don't really see how they would make the job easier or better, and if you want to learn "the Windows way" then it's typically not the way we do it (pretending it works like Linux, and using its tools instead of what Windows already has)

I want to do it in way which I like and which is close to what I already know. I don't want use VB or something which is too far of what I learned till now.

Edited by DosCode

Share this post


Link to post
Share on other sites

The methods you are wanting to use are not really suitable for the task. You may be able to get your car up the street by pulling it with its built in winch but it's a darn site faster and more assured if you use its built in engine!

Share this post


Link to post
Share on other sites

CoffeeFiend, even if the OP wants to do it his way - which is perfectly fine, I would love to see an example as you have described. I can think of other possible situations where knowing how to do this would be handy. I know CMD and wget, can make do with vbscript or jscript, and can muddle through Regular Expressions. But I'm not familiar with ServerXMLHTTP and I'm always willing to learn new things. I personally learn best with a couple of examples I can edit and model, along with links to reference documents where I can look things up. If you don't want to post an example here, since the OP has no interest in it, would you mind sending it to me via PM? I'd be much obliged.

If you do decide to post it here, who knows? Maybe the OP will see the error of his ways and see that it's not as difficult as he imagines.

Cheers and Regards

Edited by bphlpt

Share this post


Link to post
Share on other sites

Here I am so far:


@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file:%source%
echo Look for anchor:%pdf%

rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
REM Test if the line contains pdf file I look for:
SET "pdfline=!line:%pdf%=!"


if not "!pdfline!" == "!line!" (

cls
echo Line: !line!

REM Test if the pdfline contains tag b
if not "!pdfline:*><b>=!" == "!pdfline!" (

cls
echo I HAVE IT!
set "tag=!pdfline:<b>=$!"
set "tag=!tag:</b>=$!"
for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
echo Title found: "!title!"
pause

)
)
)
pause


There is still more ways how to do this part, but I think now is your turn.

Edited by DosCode

Share this post


Link to post
Share on other sites

I would love to see an example as you have described. I can think of other possible situations where knowing how to do this would be handy.

That's a good point. Here's the VBScript again for those it might help at some point:


Option Explicit
Dim oXmlHttp, oRegExp, oMatch, adoStr, sChildPages(), i, url

Set oXmlHttp = createobject ("Msxml2.ServerXMLHTTP.6.0")
oXmlHttp.Open "GET", "http://www.slv.dk/Dokumenter/dsweb/View/Collection-357", False
oXmlHttp.Send

Set oRegExp = New RegExp
oRegExp.IgnoreCase = True
oRegExp.Global = True
oRegExp.Pattern = "<a\shref=""(/Dokumenter/dsweb/View/Collection-\d*)"">"

Set oMatch = oRegExp.Execute(oXmlHttp.ResponseText)
If oMatch.Count = 0 Then WScript.Quit

'really ugly hack where we skip the first child page found (itself)
ReDim sChildPages(oMatch.Count-2)
For i = 1 to oMatch.Count-1
sChildPages(i-1) = "http://www.slv.dk" & oMatch.Item(i).Submatches(0)
Next

oRegExp.Pattern = "<a\shref=""(/Dokumenter/dsweb/Get/Document.*pdf)""\sclass=""uline""><b>(.*?)</b>"
For Each url in sChildPages
oXmlHttp.Open "GET", url, False
oXmlHttp.Send
Set oMatch = oRegExp.Execute(oXmlHttp.ResponseText)
For i = 0 to oMatch.Count-1
DownloadBinaryFile "http://www.slv.dk" & oMatch.Item(i).Submatches(0), oMatch.Item(i).Submatches(1) & ".pdf"
Next
Next

Function DownloadBinaryFile(sUrl, sFileName)
oXmlHttp.Open "GET", sUrl, False
oXmlHttp.Send
Set adoStr = CreateObject("ADODB.Stream")
adoStr.Type = 1 'adTypeBinary
adoStr.Open
adoStr.Write oXmlHttp.ResponseBody
adoStr.SaveToFile sFileName, 2 'adSaveCreateOverWrite
adoStr.Close
End Function

It's pretty ugly, there's no error handling of any kind and all that but it gets the job done. Writing essentially the same thing in other languages should be pretty straightforward too (most of the work here is getting the regular expressions right). And in most cases it would be nicer/better/simpler too (VBScript data structures suck hard, downloading binary files here is a bit of a hack, error handling is beyond awful, etc).

Share this post


Link to post
Share on other sites

Check my last post. I edited it, there is the code that could be implemented to the code I posted before, yet one little change need to be done.

Edited by DosCode

Share this post


Link to post
Share on other sites

CoffeeFiend - That is Marvelous! It certainly does get the job done!

I copied your script above and saved it as "GetPdf.vbs". I put the file in an otherwise empty folder named "Temp", I opened a command box in that folder and ran the file with the command:

cscript getpdf.vbs

and waited. The file ran silently. When the command completed and I got a command prompt back, I refreshed the contents of "Temp" in Windows Explorer and all the pdf files that the OP was looking for were in the folder and named correctly! The files opened correctly in Foxit Reader, my pdf reader of choice. Absolutely no problems at all. No extra files, nothing to rename, no extra external apps were required that weren't already part of Windows 7, all looked great!

Now that I know it WORKS, I've just got to do some reading so I can understand WHY it works and HOW I need to modify it to meet future needs. I hate to ask for more after you've put this together, but to save time blindly using Google, would you mind pointing me to a few links where I can read about the key parts of your script? Maybe The Scripting Guys address something similar?

Many Thanks!

Cheers and Regards

Edited by bphlpt

Share this post


Link to post
Share on other sites

This might do:

@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
echo In file:%source%
ECHO.

FOR /F "tokens=* delims=" %%A IN ('FIND ".pdf" "%source%" ^|FIND "href"^|FIND "class="') DO (
SET Line="%%A"
CALL :Line_process
)
GOTO :EOF

:Line_process
SET Line=!Line:^<=§!
SET Line=!Line:^>=§!
:loop_href
IF NOT "!line:~1,4!"=="href" SET Line="!Line:~2,-1!"&GOTO :loop_href
SET File="!Line:~7,-1!"
CALL :File_name !File!
SET Line="!File:%Filepath%=!"
:loop_class
IF NOT "!line:~1,4!"=="§§b§" SET Line="!Line:~2,-1!"&GOTO :loop_class
SET File="!Line:~4,-1!"
FOR /F "tokens=1 delims=§" %%B IN (!File!) DO SET FileTitle=%%B&SET File=
SET File
GOTO :EOF

:File_name
SET Filepath="%~1"
SET Filename="%~nx1"
GOTO :EOF

jaclaz

Share this post


Link to post
Share on other sites

DosCode - No offense was meant, and I'm a CMD script fan. I've written CMD scripts that are almost 3000 lines long. There are definitely cases, IMO, where CMD script is faster and more flexible than other options. There's a reason that it has existed since before Windows up to the present day. It can be very powerful. I encourage you to pursue learning how to accomplish what you want using CMD script. But I would also STRONGLY suggest you at least try CoffeeFiend's script at least once, using the instructions I listed in my last post. CoffeeFiend accomplished in that one post everything you have asked for in every post you have made here since you became a member. Everything. CoffeeFiend and I don't need to continue on. That is all you need. If nothing else you can have it as a backup approach. The post is appropriate to leave in this thread. This section deals with all types of scripting, not just CMD script, and that script deals directly with what you wanted to accomplish as an end goal. Others who read this thread might be interested in alternative approaches, as I was. You don't have to listen to our advice. But the threads here are for the benefit of all readers, not just you. Our two posts do not distract from your overall goal as much as your posts which have been scattered over multiple threads and have yet to come up with a working solution. You have been asking about bits and pieces for a week now, and we didn't even know what your overall goal was until 18 hours ago. Now that jaclaz, our CMD script wizard, has stepped in, I'm sure he can help you come up with a script that can meet your needs, and I wish you well. But that does not make alternatives less valid.

Cheers and Regards

Edited by bphlpt

Share this post


Link to post
Share on other sites

I hate to ask for more after you've put this together, but to save time blindly using Google, would you mind pointing me to a few links where I can read about the key parts of your script?

There is no central place for all of this that I'm aware of, nor am I a VBScript guru (I've mostly given up on it, and most of my VBScript knowledge dates back to the Win2k era, part of it being from writing classic ASP pages). As such I'm not certain what are the best resources out there today. But here's some bits and pieces that might help:

Msxml2.ServerXMLHTTP.6.0 is one of several objects which you can use to get content from the web (just like web pages that use "AJAX" stuff). The Open method is what you use to initialize the object, which HTTP verb to use and the URL. The Send method is what actually makes the request and gets the response (HTML here) back in its ResponseText property, which I've later parsed using regular expressions.

As for using regular expressions, the idea is to design them to have submatches for the content you want (the desired chunks surrounded by parentheses). Then you already have the info you want without further parsing or processing.

And finally, the regular expressions explained:

<a\shref="(/Dokumenter/dsweb/View/Collection-\d*)">

<a matches literal text

\s matches a space

href=" matches more literal text

( this marks the beginning of the information I'm interested in (the submatch which here is the URL of a child page)

/Dokumenter/dsweb/View/Collection- matches some more literal text

\d matches a numeric digit (0 to 9)

* means that this previous digit can be present any amount of times (zero to infinity)

) marks the end of the information I care about

"> matches literal text

<a\shref="(/Dokumenter/dsweb/Get/Document.*pdf)"\sclass="uline"><b>(.*?)</b>

<a matches literal text

\s matches a space

href=" matches more literal text

( this marks the beginning of the information I'm interested in (the submatch, which here is the URL of the PDF)

/Dokumenter/dsweb/Get/Document matches literal text

. matches any character

* which is there zero times or more

pdf matches more literal text

) marks the end of the information I care about

" literal text

\s space

class="uline"><b> literal text

( marks the beginning on the text group of infos I want (next numbered submatch which is the desired filename for the PDF)

. is still any old character

*? is a "fancier" version of * which matches any amount of times, but keeping the selection as short as possible

) marks the end of the 2nd group

</b> literal text

I think this covers the most interesting parts :) Not that I use this myself for page scraping/parsing mind you.

Share this post


Link to post
Share on other sites

Hi. I am newbie at this forum. I was invited here by DosCode to post my last solution to this problem that was developed in detail in other site. So here it is:


@echo off
setlocal EnableDelayedExpansion
set "source=GEN 0 GENERAL.html"
set "pdf=0_1_en.pdf"
echo In file: "%source%"
echo Look for anchor: "%pdf%"

for /F "delims=" %%c in ('findstr /C:"<a " "%source%" ^| findstr /C:"%pdf%"') do (
set "tag=%%c"
rem Get the value of "<b>" sub-tag
set "tag=!tag:<b>=$!"
set "tag=!tag:</b>=$!"
for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
echo Title found: "!title!"
)

If you wish, you may review the development of this solution at this site

Regards...

Antonio

EDIT: I fixed a detail in my code (changed "<a>" by "<a ") that prevent it to correctly run...

Edited by Aacini

Share this post


Link to post
Share on other sites

It is not the solution!

Unless the original remit has changed it is simply a sub function.

BTW I'm unable to access the link to take a look at your .html, (I cannot access 'http://www.slv.dk/Dokumenter/' at all I receive Connection closed by remote server message).

Share this post


Link to post
Share on other sites

It is not the solution!

Unless the original remit has changed it is simply a sub function.

BTW I'm unable to access the link to take a look at your .html, (I cannot access 'http://www.slv.dk/Dokumenter/' at all I receive Connection closed by remote server message).

It is solution for sub function. I guess you tested the Visual Basic script from the guy that has written here and admin has blocked your access. I did not say go and download/overload their server. If anybody did so, his own fault. I am not glad of this happen. But to be exact, I did not test or checked that VB scipt, I just guess there were some instructions to download the pdf files. I will paste here my code after your answer.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...