BibTeX database from PDFs via DOI

This somewhat-ridiculous BASH one-liner will create a BibTeX database file (.bib) from a bunch of PDFs via the Crossref API for DOIs, providing the PDF has a DOI on the first page.  As DOI was introduced in 2000, this will probably not work on vintage PDFs.

 for pdfs in *.pdf; do pdftotext -f 1 -l 1 "$pdfs" - |tr -d "\n" | grep -oE "(doi|DOI):\s?[A-Za-z0-9./-\(\)-]+[0-9]" | tr '[:upper:]' '[:lower:]' | sed -r 's;doi:\s?;http://api.crossref.org/works/;g' | sed -r 's;$;/transform/application/x-bibtex;g' | xargs curl -fsS 2>/dev/null | sed -e '$a\'; done > allpdf.bib

Leave a comment...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s