Bibliography management with bibtex

If you are starting from scratch we recommend using biblatex because that package provides localization in several languages, it’s actively developed and makes bibliography management easier and more flexible.

Introduction

Many tutorials have been written about what \(\mathrm\) is and how to use it. However, based on our experience of providing support to Overleaf’s users, it’s still one of the topics that many newcomers to \(\mathrm<\LaTeX>\) find complicated—especially when things don’t go quite right; for example: citations aren’t appearing; problems with authors’ names; not sorted to a required order; URLs not displayed in the references list, and so forth.

In this article we’ll pull together all the threads relating to citations, references and bibliographies, as well as how Overleaf and related tools can help users manage these.

We’ll start with a quick recap of how \(\mathrm\) and bibliography database ( .bib ) files work and look at some ways to prepare .bib files. This is, of course, running the risk of repeating some of the material contained in many online tutorials, but future articles will expand our coverage to include bibliography styles and biblatex —the alternative package and bibliography processor.

Bibliography: just a list of \bibitems

Let’s first take a quick look “under the hood” to see what a \(\mathrm<\LaTeX>\) reference list is comprised of—please don’t start coding your reference list like this because later in this article we’ll look at other, more convenient, ways to do this.

A reference list really just a thebibliography list of \bibitems :

\beginthebibliography>9> \bibitemtexbook> Donald E. Knuth (1986) \emphThe \TeX<> Book>, Addison-Wesley Professional. \bibitemlamport94> Leslie Lamport (1994) \emph\LaTeX: a document preparation system>, Addison Wesley, Massachusetts, 2nd ed. \endthebibliography> 

By default, this thebibliography environment is a numbered list with labels [1] , [2] and so forth. If the document class used is article , \begin automatically inserts a numberless section heading with \refname (default value: References). If the document class is book or report, then a numberless chapter heading with \bibname (default value: Bibliography) is inserted instead. Each \bibitem takes a cite key as its parameter, which you can use with \cite commands, followed by information about the reference entry itself. So if you now write

\LaTeX<> \citelamport94> is a set of macros built atop \TeX<> \citetexbook>.

together with the thebibliography block from before, this is what gets rendered into your PDF when you run a \(\mathrm<\LaTeX>\) processor (i.e. any of latex , pdflatex , xelatex or lualatex ) on your source file:

Citing entries from a thebibliography list

Figure 1: Citing entries from a thebibliography list.

Notice how each \bibitem is automatically numbered, and how \cite then inserts the corresponding numerical label.

\begin takes a numerical argument: the widest label expected in the list. In this example we only have two entries, so 9 is enough. If you have more than ten entries, though, you may notice that the numerical labels in the list start to get misaligned:

thebibliography with a label that’s too short

Figure 2: thebibliography with a label that’s too short.

We’ll have to make it \begin instead, so that the longest label is wide enough to accommodate the longer labels, like this:

thebibliography with a longer label width

Figure 3: thebibliography with a longer label width.

If you compile this example code snippet on a local computer you may notice that after the first time you run pdflatex (or another \(\mathrm<\LaTeX>\) processor), the reference list appears in the PDF as expected, but the \cite commands just show up as question marks [?].

This is because after the first \(\mathrm<\LaTeX>\) run the cite keys from each \bibitem ( texbook , lamport94 ) are written to the .aux file and are not yet available for reading by the \cite commands. Only on the second run of pdflatex are the \cite commands able to look up each cite key from the .aux file and insert the corresponding labels ( [1] , [2] ) into the output.

On Overleaf, though, you don’t have to worry about re-running pdflatex yourself. This is because Overleaf uses the latexmk build tool, which automatically re-runs pdflatex (and some other processors) for the requisite number of times needed to resolve \cite outputs. This also accounts for other cross-referencing commands, such as \ref and \tableofcontents .

A note on compilation times

Processing \(\mathrm<\LaTeX>\) reference lists or other forms of cross-referencing, such as indexes, requires multiple runs of software—including the \(\mathrm<\TeX>\) engine (e.g., pdflatex ) and associated programs such as \(\mathrm\), makeindex , etc. As mentioned above, Overleaf handles all of these mulitple runs automatically, so you don’t have to worry about them. As a consequence, when the preview on Overleaf is refreshing for documents with bibliographies (or other cross-referencing), or for documents with large image files (as discussed separately here), these essential compilation steps may sometimes make the preview refresh appear to take longer than on your own machine. We do, of course, aim to keep it as short as possible! If you feel your document is taking longer to compile than you’d expect, here are some further tips that may help.

Enter \(\mathrm\)

There are, of course, some inconveniences with manually preparing the thebibliography list:

This is where \(\mathrm\) and bibliography database files ( .bib files) are extremely useful, and this is the recommended approach to manage citations and references in most journals and theses. The biblatex approach, which is slightly different and gaining popularity, also requires a .bib file but we’ll talk about biblatex in a future post.

Instead of formatting cited reference entries in a thebibliography list, we maintain a bibliography database file (let’s name it refs.bib for our example) which contains format-independent information about our references. So our refs.bib file may look like this:

@booktexbook, author = Donald E. Knuth>, year = 1986>, title = The \TeX> Book>, publisher = Addison-Wesley Professional> > @booklatex:companion, author = Frank Mittelbach and Michel Gossens and Johannes Braams and David Carlisle and Chris Rowley>, year = 2004>, title = The \LaTeX> Companion>, publisher = Addison-Wesley Professional>, edition = 2> > @booklatex2e, author = Leslie Lamport>, year = 1994>, title = <\LaTeX>: a Document Preparation System>, publisher = Addison Wesley>, address = Massachusetts>, edition = 2> > @articleknuth:1984, title=Literate Programming>, author=Donald E. Knuth>, journal=The Computer Journal>, volume=27>, number=2>, pages=97--111>, year=1984>, publisher=Oxford University Press> > @inproceedingslesk:1977, title=Computer Typesetting of Technical Journals on UNIX>>, author=Michael Lesk and Brian Kernighan>, booktitle=Proceedings of American Federation of Information Processing Societies: 1977 National Computer Conference>, pages=879--888>, year=1977>, address=Dallas, Texas> > 

You can find more information about other \(\mathrm\) reference entry types and fields here—there’s a huge table showing which fields are supported for which entry types. We’ll talk more about how to prepare .bib files in a later section.

Now we can use \cite with the cite keys as before, but now we replace thebibliography with a \bibliographystyle <. >to choose the reference style, as well as \bibliography <. >to point \(\mathrm\) at the .bib file where the cited references should be looked-up.

\LaTeX<> \citelatex2e> is a set of macros built atop \TeX<> \citetexbook>. \bibliographystyleplain> % We choose the "plain" reference style \bibliographyrefs> % Entries are in the refs.bib file 

This is processed with the following sequence of commands, assuming our \(\mathrm<\LaTeX>\) document is in a file named main.tex (and that we are using pdflatex ):

  1. pdflatex main
  2. bibtex main
  3. pdflatex main
  4. pdflatex main

and we get the following output:

BibTeX output with plain bibliography style

Figure 4: \(\mathrm\) output using the plain bibliography style.

Whoah! What’s going on here and why are all those (repeated) processes required? Well, here’s what happens.

  1. During the first pdflatex run, all pdflatex sees is a \bibliographystyle and a \bibliography from main.tex . It doesn’t know what all the \cite commands are about! Consequently, within the output PDF, all the \cite commands are simply rendered as [?], and no reference list appears, for now. But pdflatex writes information about the bibliography style and .bib file, as well as all occurrences of \cite , to the file main.aux .
  2. It’s actually main.aux that \(\mathrm\) is interested in! It notes the .bib file indicated by \bibliography , then looks up all the entries with keys that match the \cite commands used in the .tex file. \(\mathrm\) then uses the style specified with \bibliographystyle to format the cited entries, and writes a formatted thebibliography list into the file main.bbl . The production of the .bbl file is all that’s achieved in this step; no changes are made to the output PDF.
  3. When pdflatex is run again, it now sees that a main.bbl file is available! So it inserts the contents of main.bbl i.e. the \begin. \end into the \(\mathrm<\LaTeX>\) source, where \bibliography is. After this step, the reference list appears in the output PDF formatted according to the chosen \bibliographystyle , but the in-text citations are still [?].
  4. pdflatex is run again, and this time the \cite commands are replaced with the corresponding numerical labels in the output PDF!

As before, the latexmk build tool takes care of triggering and re-running pdflatex and bibtex as necessary, so you don’t have to worry about this bit.

Some notes on using \(\mathrm\) and .bib files

A few further things to note about using \(\mathrm\) and .bib files :

IEEEtran bibliography style output

Figure 5: IEEEtran bibliography style output.

We’ll talk more about different bibliography styles, including author–year citation schemes, in a future article. For now, let’s turn our attention to .bib file contents, and how we can make the task of preparing .bib files a bit easier.

Taking another look at .bib files

As you may have noticed earlier, a .bib file contains \(\mathrm\) bibliography entries that start with an entry type prefixed with an @ . Each entry has a some key–value \(\mathrm\) fields, placed within a pair of braces ( <. >). The cite key is the first piece of information given within these braces, and every field in the entry must be separated by a comma:

@articleknuth:1984, title=Literate Programming>, author=Donald E. Knuth>, journal=The Computer Journal>, volume=27>, number=2>, pages=97--111>, year=1984>, publisher=Oxford University Press> > 

As a general rule, every bibliography entry should have an author , year and title field, no matter what the type is. There are about a dozen entry types although some bibliography styles may recognise/define more; however, it is likely that you will most frequently use the following entry types:

@inproceedingsFosterEtAl:2003, author = George Foster and Simona Gandrabur and Philippe Langlais and Pierre Plamondon and Graham Russell and Michel Simard>, title = Statistical Machine Translation: Rapid Development with Limited Resources>, booktitle = Proceedings of MT Summit IX>>, year = 2003>, pages = 110--119>, address = New Orleans, USA>, > 
@phdthesisAlsolami:2012, title = An examination of keystroke dynamics for continuous user authentication>, school = Queensland University of Technology>, author = Eesa Alsolami>, year = 2012> > 
@inbookpeyret2012:ch7, title=Computational Methods for Fluid Flow>, edition=2>, author=Peyret, Roger and Taylor, Thomas D>, year=1983>, publisher=Springer-Verlag>, address=New York>, chapter=7, 14> > 
@incollectionMihalcea:2006, author = Rada Mihalcea>, title = Knowledge-Based Methods for WSD>>, booktitle = Word Sense Disambiguation: Algorithms and Applications>, publisher = Springer>, year = 2006>, editor = Eneko Agirre and Philip Edmonds>, pages = 107--132>, address = Dordrecht, the Netherlands> >