Jump to content

renaming files in CMD scripts


Recommended Posts

I guess you tested the Visual Basic script from the guy that has written here and admin has blocked your access. I did not say go and download/overload their server.
I haven't tested anyone else's code, I have no need to. I am prevented access to the site simply because I have a blocking application which does not like that site.

However I've taken a look at the html code for GEN 0 GENERAL and it appears that your .pdf link lines look like this:

<a href="/Dokumenter/dsweb/Get/Document-408/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>

<a href="/Dokumenter/dsweb/Get/Document-409/EK_GEN_0_2_en.pdf" class="uline"><b>GEN 0.2 Record of AIP Amendments</b></a>

<a href="/Dokumenter/dsweb/Get/Document-410/gen_0_3.pdf" class="uline"><b>GEN 0.3 Record of AIP Supplements</b></a>

<a href="/Dokumenter/dsweb/Get/Document-411/EK_GEN_0_4_en.pdf" class="uline"><b>GEN 0.4 Checklist of AIP Pages</b></a>

<a href="/Dokumenter/dsweb/Get/Document-412/EK_GEN_0_5_en.pdf" class="uline"><b>GEN 0.5 List of Hand Amendments to the AIP</b></a>

<a href="/Dokumenter/dsweb/Get/Document-413/EK_GEN_0_6_en.pdf" class="uline"><b>GEN 0.6 Table of Contents to Part 0 and 1</b></a>

If these are indeed the lines you are looking for then the following will always fail:
findstr /C:"<a>"

Link to comment
Share on other sites


I guess you tested the Visual Basic script from the guy that has written here and admin has blocked your access. I did not say go and download/overload their server. If anybody did so, his own fault. I am not glad of this happen. But to be exact, I did not test or checked that VB scipt, I just guess there were some instructions to download the pdf files. I will paste here my code after your answer.

Yzöwl might not have tested the VB script, but I did of course, once, and my access has not been blocked. So there was no consequences to my "fault". Sorry to disappoint you. Don't worry, I'm not going to download more from the site, there is no need. And it is wrong to overload anyone's server for any reason. You also downloaded the PDF files from the site at least once, with wget I believe you said, otherwise you wouldn't be needing to rename them. It is too bad you didn't test CoffeeFiend's script, you would have a backup plan and you might have learned something. I look forward to seeing your completed, working script. I will be happy to learn any new tricks you're willing to share, in any programming language. I'm just sorry you don't seem to have the same attitude.

Cheers and Regards

Link to comment
Share on other sites

Yzöwl: Sure

findstr /C:"<a>"

will fail. This tag has no sense. Because a tag always has some attributes.

bphlpt: Well, I was worried somebody could to abuse the link downloading big amout of files, because if more people would that do it at same time, I think that could slow down or maybe overload the server? Don't know. When I downloaded the files it took me 1 or three minutes, but I downloaded more files not just these ones and I did that by parts. I don't need the VB script even it can be good. Realize how much time I spent by learning the cmd things, even I am pretty bored (not in bad meaning, rather "lazy") and not devoting so much time to this job as I could. So now I send just what I have in my PC and going to have some rest.

You can repair me, if you see some mistakes in my code.


@echo off
setlocal EnableDelayedExpansion
for %%P in (*.pdf) do (
set "pdfFile=%%P"
set htmlMask="GEN !pdfFile:~0,1! *.html"
REM echo !htmlMask!
echo Testing "!pdfFile!": Looking for !htmlMask!
set "found="
for %%H in (!htmlMask!) do (
set found=1
echo "%%H"
for /f "delims=" %%b in ('find /i "<title>" ^< "%%H"') do (
set "pdf=%%P"
set "source=%%H"
set "var=%%b"
call :JUMP
)

REM do whatever you need to do with the %%P pdf file and %%H html file
)
REM if not defined found echo NOT FOUND
)

)
echo done - check tempren.bat
::goto :EOF

:JUMP
REM Get title for pdf from html file

set "source=%source%"
set "pdf=%pdf%"

rem Process each line in %source% file:
for /F "usebackq delims=" %%c in ("%source%") do (
set "line=%%c"
REM Test if the line contains pdf file I look for:
SET "pdfline=!line:%pdf%=!"
if not "!pdfline!" == "!line!" (
REM Test if the pdfline contains tag b
if not "!pdfline:*><b>=!" == "!pdfline!" (
cls
set "tag=!pdfline:<b>=$!"
set "tag=!tag:</b>=$!"
for /F "tokens=2 delims=$" %%b in ("!tag!") do set title=%%b
set "title=!title::=-!"
set "title=!title:\=-!"
set "title=!title:/=-!"
set "title=!title:|=-!"
set "title=!title:?=-!"
echo Title found: "!title!"
pause
)
)
)

I will finish it later :-) I'm so lazy...

Link to comment
Share on other sites

Why then did you invite someone who has not only failed to answer your question but also provided non working code to join our Forum and post it in this Topic?

Now if I think back to a previous Topic of yours I seem to recall you asking about changing the .pdf file names by removing some characters from the beginning of them, are these in fact the same files that you are once again renaming?

e.g. Taking the following line from my previous posts GEN 0 GENERAL.html output:

<a href="/Dokumenter/dsweb/Get/Document-409/EK_GEN_0_2_en.pdf" class="uline"><b>GEN 0.2 Record of AIP Amendments</b></a>
Have you not already renamed the downloaded file, "EK_GEN_0_2_en.pdf", to "0_2_en.pdf"? And are you now your wanting to further rename it to "GEN 0.2 Record of AIP Amendments.pdf"
Link to comment
Share on other sites

Why then did you invite someone who has not only failed to answer your question but also provided non working code to join our Forum and post it in this Topic?

Name of the topic is "renaming files in CMD scripts". I believe his code is fine and works. But I just want CMD solution or CMD+gnuwin.

I originally did not know that I would rename the files by html title. This decision I did later. I still can remove the non-willing prefix by script.

Edited by DosCode
Link to comment
Share on other sites

Oh, I have overlooked the post from Aacini (and more posts at that time), and I did not see the error in his code, because I did not test his code. Sorry for all. But never mind you came here, Aacini, thanks for your try. Problem is that the tag a alway has attributes. But your solution is inspirative to me. I could try use regex for this. But have no time to think about it now. What about to try findstr regex like <a *>.*</a> but to add there ungreedy option?

Edited by DosCode
Link to comment
Share on other sites

Oh, I have overlooked the post from Aacini (and more posts at that time), and I did not see the error in his code, because I did not test his code.

What about the snippet I posted? :unsure:

Did you also fail to try it? :w00t:

jaclaz

Edited by jaclaz
Link to comment
Share on other sites

I originally did not know that I would rename the files by html title. This decision I did later. I still can remove the non-willing prefix by script.
Is that your answer to my question? Is the following example what you are trying to achieve?
e.g. Taking the following line from my previous posts GEN 0 GENERAL.html output:
<a href="/Dokumenter/dsweb/Get/Document-409/EK_GEN_0_2_en.pdf" class="uline"><b>GEN 0.2 Record of AIP Amendments</b></a>
Have you not already renamed the downloaded file, "EK_GEN_0_2_en.pdf", to "0_2_en.pdf"? And are you now your wanting to further rename it to "GEN 0.2 Record of AIP Amendments.pdf"
Is your goal to rename the downloaded file "EK_GEN_0_2_en.pdf" to "GEN 0.2 Record of AIP Amendments.pdf"?
Link to comment
Share on other sites

jaclaz:

Yes, I overlooked. Maybe because when I came to PC, I have seen the CoffeeFiend's reply about VBS and I did not read it, I just did not see there is one or two posts above it. Yours posts have been just between CoffeeFriend and bphlpt post about VBS, that I did not concentrated my attention to. The post are long on my screen taking much of place. Not simple to orientate. I guess the fault is on design of this site because the left column is too wide! I will try to have some time tomorrow. Now time to sleep.

Edited by DosCode
Link to comment
Share on other sites

If that's all you want to do, here's a quick untested script.

@ECHO OFF
SETLOCAL ENABLEEXTENSIONS ENABLEDELAYEDEXPANSION
FOR /F "TOKENS=*" %%A IN (
'DIR/B/A-D^|FINDSTR/BERIC:"GEN[ ][0-9][ ].*[.]HTML"'
) DO (
FOR /F "TOKENS=*" %%B IN (
'FINDSTR/RIC:"[.]PDF[\"][ ]*CLASS^=\"ULINE\"" "%%A"'
) DO (
SET _L=%%B
FOR /F "USEBACKQ TOKENS=2,3 DELIMS==" %%C IN ('%%B') DO (
SET _P=%%C
SET _D=%%D
FOR /F "USEBACKQ" %%E IN ('!_P:/^=\!') DO (
SET _P=%%~nxE
SET _X=%%~xE
)
FOR /F "USEBACKQ DELIMS=<" %%E IN ('!_D:*GEN^=!') DO (
SET _D=GEN%%E!_X!
)
IF EXIST "!_P!" ECHO=REN "!_P!" "!_D!"
)
)
)
PAUSE

Remove "ECHO=" from the bottom IF EXIST line and the PAUSE line if you're happy with the output.

I do not give permission for this script to be posted on another site.

Link to comment
Share on other sites

Hi. The big problem with my code is that in this line:


for /F "delims=" %%c in ('findstr /C:"<a>" "%source%" ^| findstr /C:"%pdf%"') do (

the start of tag must be "<a " instead of "<a>", as Yzöwl noted:

... then the following will always fail:
findstr /C:"<a>"

I had fixed this problem in my original post above.

As I said in the original site, I don't know much about .PDF files and this detail was confused because the tag <a> was mentioned there. However, I noted this point when I indicated that I tested this program with this "GEN 0 GENERAL.html" example file:


Line one
<a>href="/Dokumenter/EK_GEN_0_X_en.pdf" class="uline"><b>GEN 0.X Preface</b></a>
Line three
<a>href="/Dokumenter/EK_GEN_0_1_en.pdf" class="uline"><b>GEN 0.1 Preface</b></a>
Line five

I apologize for any trouble caused by this detail...

Link to comment
Share on other sites

Now that it might already be solved (haven't actually tried any solution but I trust Yzöwl), here's a PowerShell one-liner that does the job (i.e. looks inside all the htm files, figures out the old PDF name as downloaded, and renames all of them based on the title of the anchor tag -- assuming all htm and pdf files are in the same directory):


gc *.htm|?{$_ -match [regex]'".*/(.*pdf)".*?b>(.*?)<'}|%{ren $matches[1] ($matches[2]+".pdf")}

5 minutes and 94 characters total :whistle:

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...