Jump to content
Strawberry Orange Banana Lime Leaf Slate Sky Blueberry Grape Watermelon Chocolate Marble
Strawberry Orange Banana Lime Leaf Slate Sky Blueberry Grape Watermelon Chocolate Marble

MSFN is made available via donations, subscriptions and advertising revenue. The use of ad-blocking software hurts the site. Please disable ad-blocking software or set an exception for MSFN. Alternatively, register and become a site sponsor/subscriber and ads will be disabled automatically. 


Sign in to follow this  
DosCode

renaming files in CMD scripts

Recommended Posts

I try according 5eraph's code to make my own try. But I have some problem that it does not do what I would expected. I will do it in two steps so this is just step one, on which I want to try how to work with variables when I try to rename file.

I have files with filenames started by number 0 to 9... I want to get the number to variable. First file is 0_1_en.pdf, so I would like to get 0 to sPDFName.


SETLOCAL EnableDelayedExpansion
for /f "delims=" %%PDF in ('dir /b *.pdf') do (
SET "sPDFName=%%~nxPDF"
echo "!sPDFName:~0,1!"
IF "!sPDFName:~0,1!"=="1" (SET "sPDFName=!sPDFName:~0,1!")
IF "!sPDFName:~0,1!"=="0" (SET "sPDFName=!sPDFName:~0,1!")
echo "!sPDFName:~-0!"
echo "%sPDFName%"
pause
for /f "delims=" %%HTML in ('dir /b *.html') do (
:: compare "GEN "+sPDFName+" " to be equal to "!%%HTML:~0,6!"
IF "!sPDFName:~-0!"=="!%%HTML:~0,6!" echo %%HTML
)
)

I display the variable sPDFName by echo "!sPDFName:~-0!"

which probably is not the best way. How to display the sPDFName's variable value, %sPDFName% shows nothing?

And I need to compare this:

"GEN "+"sPDFName+" " should be equal to "!%%HTML:~0,6!"

so if the pdf file has 0 the file "GEN 0 GENERAL.html" will be used in next process.

But don't know how can I connect these 3 parts together on left side of expression.

Edited by DosCode

Share this post


Link to post
Share on other sites

Guest

%%PDF

%%HTML

You should use only one letter as a FOR variable. Try %%P and %%H instead.

To display sPDFName just use the following:

ECHO !sPDFName!

Share this post


Link to post
Share on other sites

Please. DosCode, go spend some time reading as many of the pages at http://ss64.com/ as you can, especially those listed here - http://ss64.com/nt/syntax.html for variable and command syntax, and here - http://ss64.com/nt/ for details and usage examples of the various CMD script commands. It will be well worth your time.

I'm surprised that you have been tasked with a job that you seem unprepared for. While most of the tasks you are trying to accomplish seem simple to do in CMD scripts, once you are familiar with them, you might be better off using another programming language, one you are more familiar with. Is there a reason you have been asked to do this using CMD scripts?

Cheers and Regards

Share this post


Link to post
Share on other sites

Well, I study the manual since night and I read "all" about SET and variables, but I can't find some things.

I read that the exclamation mark is used by system variables something related to "expanding" of variable (my CMD is not in English, so I try to translate back). But did not find how to connect/join the two types of variables with strings. There are readings and examples about searching of substrings and replacing if them.

So can you link me where I can find it? So there is variable @something@ to join with "somethingelse" or to join with @somethingelse@ and there should be variable !something! to join with "somethingelse" or to join with !somethingelse! ... or maybe with @something@ ....

Edited by DosCode

Share this post


Link to post
Share on other sites

On your original question, could you state whether the .pdf file names always start with a single number or whether they can be two or more numbers together, (3_1_en_pdf, 14_2_en.pdf, 687_3_en.pdf).

Could you please also provide the information about .html files you're using, i.e their naming convention.

Share this post


Link to post
Share on other sites

The pdf filename number #1 means chapter. There are about 6 chapters in this case, no more. So we speak about 1 digit on begin of file, which defines what html file I will process. So if I have the digit I know in which file I will look for information.

In the case of first chapter, the name of html file is "GEN 0 GENERAL.html".

There are next 5 files, the naming convention is: "GEN " + digit + " "

PS: I already know where I have a error so I try no to finish this

Edited by DosCode

Share this post


Link to post
Share on other sites

Till now I have this code:


@echo off
SETLOCAL EnableDelayedExpansion
for /f "delims=" %%P in ('dir /b *.pdf') do (
SET "sPDFName=%%~nxP"
echo "!sPDFName:~0,1!"
IF "!sPDFName:~0,1!"=="1" (SET "sPDFName=!sPDFName:~0,1!")
IF "!sPDFName:~0,1!"=="0" (SET "sPDFName=!sPDFName:~0,1!")
SET tempStr="GEN !sPDFName!*.html"
echo !tempStr!
::echo "!sPDFName!"
for /f "delims=" %%H in ('dir /b *.html') do (
CALL SET substring=%%H:~!tempStr!%%
echo Result:%substring%
IF "%%substring%%"=="%%H" (echo %%H) ELSE (echo NOT FOUND)
pause ))

I am not sure with the CALL SET expression, it should return name of %%H if the file exists. If does not exist then should return null. Not positive result. Probably I have to remove quotes from !tempStr!, but did not succeed to do that.

Expects you have files "0_1_en.pdf" and "GEN 0 GENERAL.html".

Edited by DosCode

Share this post


Link to post
Share on other sites

Are you simply requiring console output for which .pdf's have associated .html's, or is your intention something else?

Share this post


Link to post
Share on other sites

@DosCode, while it is very smart of you to break down a larger task into smaller pieces and attack them one at a time, when you are asking for help it would probably be useful to layout the entire job you are trying to accomplish. Since you are unfamiliar with some of the capabilities of the CMD script language, you might have broken the task down in ways that actually make the job harder. So lay the whole thing out for us. Then we'll help you make sure that you've broken it down appropriately and help you with each piece.

Cheers and Regards

Share this post


Link to post
Share on other sites

Not good solution I tried to do. Here is the thing I wanted to do, in working script.


@echo off
SETLOCAL EnableDelayedExpansion

for %%P in (*.pdf) do (
set "pdfFile=%%P"
set htmlMask="GEN !pdfFile:~0,1! *.html"
REM echo !htmlMask!

echo(
echo Testing "!pdfFile!": Looking for !htmlMask!
set "found="
for %%H in (!htmlMask!) do (
set found=1
echo "%%H"

for /f "delims=" %%b in ('find /i "<title>" ^< "%%H"') do (
set "pdf=%%P"
set "source=%%H"
set "var=%%b"
call :JUMP
)

REM do whatever you need to do with the %%P pdf file and %%H html file
)
REM if not defined found echo NOT FOUND
)

)
echo done - check tempren.bat
::goto :EOF

:JUMP
:: code I am working on....

I am going to do a search in html file %%H. There is some tag a in the html file, which includes %pdf% filename (without path). I would like to get the contents of this tag, including tag b, including title of the pdf file. The goal is to get the title for the pdf file to variable, so I can use it to rename the file. If the tag a includes image but no text like this <a pdffile link><img></a>, than skip this link. The link is on single line, so I could use grep.

Edited by DosCode

Share this post


Link to post
Share on other sites

You have not provided us with enough information to create your script.

The key to your entire project appears to be the .html file, the way you have currently tried to break down the task into several topics places more emphasis on the .pdf files. My best understanding of what you want is to search each .html file in the local directory for a href containing a .pdf file name, (with a naming convention which can be likened to the source file name), and which is also local to the script. If true, you wish to rename the file, (either the .pdf or .html, I'm not sure), according to the content of its own internal title tag. If that is the case there is very little in what you've got so far which is really of any use to you.

There are too many unknowns, and potential pitfalls with the file naming conventions and html contents for it to be worth formulating anything other than a one off solution with very specific criteria.

Take your .html file for instance, how can you be assured that the content is formatted in such a way as the title is on its own line and as I've mentioned previously does not contain an attribute? Are you aware that the cmd language is extremely poor at handling common .html characters such as >,&,% and <?

Although I'd have once enjoyed the challenge of using command scripts alone to perform this task it simply isn't practical to do so. I would certainly suggest at least integrating the use of third party utilities such a AWK or SED and strongly suggest you contemplate using a programming language more suited to the task.

Share this post


Link to post
Share on other sites

If that is the case there is very little in what you've got so far which is really of any use to you.

^ A million times this. Too many asking small questions to hack together disjointed parts of a script, without a clear goal in mind. And if that's the case (the parsing html to rename PDF files description you just gave) then that's totally not how I would do it (not that I imply my own method is the one and only way or anything)

Share this post


Link to post
Share on other sites

TBH without all of the information and knowledge of which tools/language is used I couldn't say specifically which method I'd use either. I may be more inclined to use a directory structure to group linked files to their respective html rather than having them all in one location and order according to naming convention. It really is very dependant upon the specifics of their scenario.

Share this post


Link to post
Share on other sites

TBH without all of the information and knowledge of which tools/language is used I couldn't say specifically which method I'd use either.

I completely agree.

I may be more inclined to use a directory structure to group linked files to their respective html rather than having them all in one location and order according to naming convention.

And yet there are so many other options, some of which may work better depending on the specifics. The first thing that comes to mind (assuming that the HTML anchor is easy to find always in the same place in every page, or that there's only ever one anchor tag linking to a PDF per page -- but that's a big if for sure) I'd extract the info from the HTML tag first (looping through the HTML pages), and rename the file based on that. So many ways... Even just for getting at that HTML anchor tag itself (plain old string search, using the DOM, regular expressions, using a HTML or SGML parser, etc).

But yes, without more details we're just wasting time.

Share this post


Link to post
Share on other sites

It is as you say "one off solution with very specific criteria". All pdf files are in one folder. And I want to rename them according the title, which is enclosed in their links. I can use CMD which I prefer because I have windows and I learn it, and try it to be independent on gnuwin32, but if it is necessary, or optional, I don't mind to use gnuwin32 programs. It is also interesting to know/learn how the grep or sed works.

In bin folder have these programs installed: egrep.exe, fgrep.exe, find.exe, grep.exe, locate.exe, pcregrep.exe, pcretest.exe, sed.exe, xargs.exe.

I solved more ways of working with these pdfs and htmls last days, but I would not call it project. They are just helping scripts, and I would like they to help me to save many work, if I am possible to understand them and edit them if necessary. I try to use the potential of CMD in my computer, which I did not know about before.

Answer to your paragraph in italic. I know what is in the file because I checked it. But I can refer you if you want to check it online. Check this page here. If you click on one of the yellow folder icon, you will see the html file having the links to the pdfs I work with.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...