Automatic PDF renaming based on title

I have thousands of scientific PDFs that i need to rename, many do not have metadata. I would like to be able to create an automator action that could open a folder then open each PDF, copy the title and rename the document and save in a new folder. I have spent hours try to figure this out so I would greatly appreciate nay help. I have Apple G5 2.26Gz quad running os10.6 Thanks!

17.1k 21 21 gold badges 83 83 silver badges 128 128 bronze badges asked Apr 10, 2011 at 14:17 41 1 1 gold badge 1 1 silver badge 2 2 bronze badges

4 Answers 4

There is Mendeley, an online research tool that allows you to manage scientific publications.

It has a Mendeley Desktop tool where you can drag and drop PDFs. Mendeley will automatically parse the authors and titles from the PDFs.

enter image description here

Then, you can rename the file by right clicking and "Rename Document Files . ". You can also rename multiple files at once.

enter image description here

It's available for Windows and OS X.

answered Apr 10, 2011 at 16:36 231k 71 71 gold badges 622 622 silver badges 603 603 bronze badges

I've +1'd Mendely because it works pretty well, but it can suometimes be flaky in extracting document titles.

Commented Apr 11, 2011 at 15:14

@Ian Sadly, yes. There's never gonna be a perfect solution. It's weird that it doesn't auto-capitalize titles when they're all caps in the original PDF.

Commented Apr 11, 2011 at 17:39 It's so awesome! saved me so much time! Thank you so much! Commented Aug 18, 2013 at 6:12 Zotero does this better, and without the corporate association Mendeley now suffers from. Commented Sep 11, 2018 at 18:08

@JackWasey You're right. Considering that my post is from 2011, I'm surprised to see it's owned by Elsevier now, and how little it has improved over time.

Commented Sep 12, 2018 at 7:21

If I understand you correctly, you want to extract the paper title which is present on the first page of the PDF (usually in bigger print than the abstract and following text) and use it as the file name.

I'm afraid that you probably won't find a one-fits-all solution, since there can be varying amounts of non-title text at the beginning of the PDF, making it hard to extract the actual title for PDFs coming from different journals.

TO get a solution that works for a certain percentage of your PDFs, I would probably

use Ghostscript's pdf2ps and ps2ascii to extract plain text from the PDF
parse this plain text for a journal title somewhere in the first kilobyte or so
depending on the journal try to come up with a heuristic extracting the paper title from the plaintext.

Of course if you can find a tool that can extract relative text size as well as plain text from a PDF, that would probably also greatly help.

Good luck - would be interesting to see if you find a way to automate that! The main thing I do when downloading articles myself is to name them in a systematic way, but it sure would be great to have something to do this afterwards.

answered Apr 10, 2011 at 16:35 Jonas Heidelberg Jonas Heidelberg 2,070 2 2 gold badges 24 24 silver badges 39 39 bronze badges Luckily, there is a solution, see my answer :) Commented Apr 10, 2011 at 16:37

@slhck - cool, didn't know Mendeley can do that :-). So it batch processes all PDFs if you drag-and-drop them simultaneously?

Commented Apr 10, 2011 at 16:40 Yep, even for thousands of files! Commented Apr 10, 2011 at 16:46

If you do not want to use external software and feel like writing your own script try opening your pdfs as plain text with a text editor, then look for patterns. Either search for the keyword 'title', or search for words in the title and see where they appear.

To give you a few examples (scientific journals in chemistry):

ACS (American Chemical Society): the title appears between brackets after the second occurrence of the keyword '/title'

Wiley publishing: the title appears between brackets after the first (and only) occurrence of the keyword '/Title'

Rsc publishing: does not have the title in plain text.

Springer: it seem to depend on the journal

Since most journals I read are from wiley or acs the situation would look rather good for me.

This could be a plan: 1. study pdfs from the publishers you read journals from most often 2. pick out those that have the title in plain text. this should not be a problem since they all include their name in the last Kbytes of the pdf 3. manage those with a script

Depending on how many of the journals you read use the title tag for the article's title this could be useful or not.