a r X i v B i b   q u i c k s t a r t
( for arXivBib version 1.00 )
Click for:   download arXivBib
arXivBib manual
 

Copyright © 2002-2005, John Forkosh Associates, Inc.
email: john@forkosh.com



    C o n t e n t s                 a r X i v B i b   S u m m a r y    
Introduction
Input preparation
Output
Building arXivBib
Running arXivBib
GPL License
Concluding Remarks

  Related   Pages  

BibTeX on CTAN
BibTeX tutorial
 
Installation:     Download arxivbib.zip and then type
    unzip arxivbib.zip
    cc arxivbib.c -o arxivbib
 
Usage:     Prepare inputfile in the form...
      quant-ph/9812037\\cite=Aha:99b\\entry=article
      math.OA/0404553 \\ cite=Kribs:04 \\ etc=etc
      etc
Then enter the command
    nohup ./arxivbib < inputfile > outputfile &
and outputfile will be in the form...
      @article{Aha:99b,
              author = "Dohrit Aharonov",
  month = "December",
  note = "77 pages,...",
  title = "Quantum Computation",
  year =   1998,
  abstract   = "In the last few years...",
  date = "Tue, 15 Dec 1998",
  eprint = "quant-ph/9812037" }
      @etc

Introduction  

arXivBib, licensed under the gpl, retrieves abstract pages from arXiv.org and reformats them as BibTeX entries. It saves lots of typing if you have many such references to cite, but isn't worth the trouble if you only have a few.

Input preparation  

An input file containing lines in the form

    quant-ph/9812037 \\ cite=Aha:99b \\ entry=article
    math.OA/0404553 \\ cite = Kribs:04 \\ etc = etc
    quant-ph/0506082 \\
       ?publisher=unpublished \\ +note=additional comments
    etc. 

specifies the abstracts retrieved.

Basic input rules...

Each input entry consists of one or more fields separated by \\. The first field is usually the arXiv reference, as illustrated above. Subsequent fields are optional, and all have the form name = value. Whitespace before and after \\ and = is optional and ignored. Any number of fields, and any name and value whatsoever are permitted. (How your fields are used, or if they're even used at all, depends on your entry-type definitions.)

  \\ at the end of an input line signals that another field for the same entry continues on the next line. A single \ at the end of a line signals that the same field continues on the next line. Otherwise, lines ending with neither \ nor \\ signal the end of that entry.

A cite= field, if present, specifies the LaTeX \cite{key} for the entry, and defaults to your arXiv reference, e.g., quant-ph/9812037, if not present. An entry= field, if present, specifies the BibTeX entry-type, e.g., entry=book for @book, and usually defaults to @article or @unpublished if not present.

Additional extra fields are merged with the abstract data read from arxiv.org. For example, author=John Doe replaces the paper's true author(s). But the leading ? on ?publisher=unpublished (in the example input above) means it's used _only_ if the abstract contains no publisher field. Similarly, the leading + on +note=more comments means it's concatanated to the abstract's note.

Lines beginning with (i.e., whose first non-whitespace characters are) @string or @preamble are just written to the output file, so these directives are just passed on to BibTeX. Similarly, an arXivBib comment is any line that begins with #, and comments are just written to the output file.

ArXiv input...

Papers submitted to arXiv.org are accompanied by formatted abstracts parsed by arXivBib, which uses the following abstract-to-entry field mapping:

 arXiv abstract                 BibTeX entry 
Authors:   author =
lines comprising the title   title =
Date:  becomes both   year =
month =
Journal-ref:   journal =
Report-no:   report =
Comments:   note =
Subj-class:   subj-class =
MSC-class:   msc-class =
ACM-class:   acm-class =
DOI:   doi =
lines comprising the abstract   abstract =

Parsed abstract information is outputted as BibTeX entries in a more-or-less obvious manner. Fields like report=, subj-class=, msc-class=, acm-class= and doi= are ignored by BibTeX .bst styles I'm aware of.

Future versions of arXivBib will parse abstracts more carefully. For example, the arXiv Journal-ref: field frequently contains volume, number, pages, year, etc, that arXivBib can try to interpret as separate BibTeX fields. (In the particular case of Journal-ref:, authors usually write standard references like Phys.Lett. B630 (2005) 68-72, which are typeset nicely in your bibliography even when formatted as single BibTeX fields.)

Additional user input options...

Instead of arXiv references like quant-ph/9812037, if your first field begins with an @, like @article (but not @string or @preamble, mentioned above), it's taken to be a BibTeX entry-type. In this case, arXivBib doesn't retrieve any abstract, but merely reformats your remaining fields as a BibTeX entry. For example,

    @book \\ cite = nielsen:2000 \\ title = Quantum \
    Computation and Quantum Information \\ author = Michael A. \
    Nielsen, Isaac L. Chuang \\ publisher = Cambridge U.P. \\
    year = 2000 \\ ISBN = 0-521-63503-9 

just produces a BibTeX @book entry

    @book{nielsen:2000,
       author      = "Michael A. Nielsen and Isaac L. Chuang",
       title       = "Quantum Computation and Quanum Information",
       year        =  2000,
       publisher   = "Cambridge U.P.",
       isbn        = "0-521-63503-9"  } 

In principle, you can write all your .bib files this way, even without any abstract retrievals at all, and just use arXivBib as a simple reformatting tool.

Any @entry-type in the first field (or entry=entry-type in a subsequent field when retrieving arXiv abstracts) is permitted, whether or not BibTeX defines it. ArXivBib simply outputs every field it has data for. Input like quant-ph/9812037 \\ entry=anytype \\ etc simply formats a BibTeX @anytype{etc...} entry containing all fields parsed from the arXiv abstract, merged with any additional fields from the input.

Similarly, as already mentioned above, arXivBib's optional name = value fields may contain any name and value whatsoever. Silly example input like @anytype \\ cite=tester \\ salutation=Hello just outputs

    @anytype{tester,
       salutation  = "Hello"  } 

Note that arXivBib wasn't written as a reformatting tool, but thinking about it that way may help clarify its intended purpose to parse arXiv abstracts, merge that data with any additional fields you supply, and output a BibTeX-formatted entry.

arXivBib output  

After preparing your input file, you can issue the following command from the Unix shell prompt
        nohup ./arxivbib < inputfile > outputfile &
The nohup...& runs arXivBib in the background becasue it waits 15 seconds between abstract retrievals to avoid tripping arXiv's robot detection. The program eventually runs to completion, and an output file containing (by default) BibTeX-formatted entries is produced. Some other (non-default) output options are also discussed.

arXivBib edits...

Abstracts on arxiv.org aren't required to comply with BibTeX or with LaTeX formatting rules. Authors separated by commas, names like Schr\"odinger (which BibTeX sees as as the beginning of an unmatched "..." string), math mode expressions not surrounded by $...$, etc, are very common. arXivBib tries to remediate these problems and generate output that's simultaneously accepted by both BibTeX and LaTeX. That doesn't always work, and sometimes results in over-aggressive editing. For example, authors often use a \symbol that they have a \newcommand for, so arXivBib rewrites this as $\symbol$ but it's still undefined in your bibliography.

You should thus anticipate some errors from BibTeX and/or LaTeX when using arXivBib. Just edit its output .bib file and manually correct any problems arXivBib couldn't handle.

At this early release, arXivBib's preliminary rewrite rules are subject to change, so I won't document them in detail. But you can turn off all arXivBib edits with the -e command-line switch, e.g.,
        nohup ./arxivbib   -e   < inputfile > outputfile &
makes no changes to text from retrieved abstracts, authors, titles, etc.

BibTeX output...

By default, arXivBib's output consists of BibTeX entries in the form (the quant-ph/9812037 example is illustrated)

    @article{Aha:99b,
       author      = "Dorit Aharonov",
       month       = "December",
       note        = "77 pages, figures included in the ps file. To
                      appear in: Annual Reviews of Computational
                      Physics, ed. Dietrich Stauffer, World Scientific,
                      vol VI, 1998. The paper can be down loaded also
                      from this http URL",
       title       = "Quantum Computation",
       year        =  1998,
       abstract    = "In the last few years, theoretical study of
                      quantum systems serving as computational devices
                      has achieved tremendous progress.  ...etc...
                      Quantum algorithms, including Shor's factorization
                      algorithm and Grover's algorithm for",
       abstract1   = "searching databases, are explained.  ...etc...
                      discussing the possible implications of
                      quantum computation on fundamental physical
                      questions, such as the transition from quantum to
                      classical physics.",
       date        = "Tue, 15 Dec, 1998",
       eprint      = "quant-ph/9812037"  }
    @etc 

If your input contains no entry= field, then arXivBib defaults to @article if the abstract contains a Journal-ref: field, or to @unpublished if it doesn't. You can change one or both defaults by using the -t and/or -u switches when you run arXivBib. For example,
        nohup ./arxivbib   -t techreport   -u misc   < inputfile > arxivoutputfile &
formats a @techreport entry if the abstract contains a Journal-ref: field, or a @misc entry if it doesn't.

The text of the abstract is also retrieved by arXivBib, and placed in the   abstract=   field. BibTeX seems to have a 1000-character field length limit, so arXivBib breaks long fields into several shorter ones, as illustrated above. Had the example abstract been even longer, you'd see an abstract2 field, etc. How that appears on LaTeX output is up to your .bst style file, but at least you'll see the abstract in your personal .bib file.

arXivBib output...

Several (two at this time) output formats are available from arXivBib, selected with the optional -t1 or -t2 switch on the command line. We just described the default -t2 BibTeX format (though you needn't write the -t-switch at all if you want the default). Format -t1, for which you can just write -t (without an entry-type), produces output in arXivBib's input format, described above. Use this format in conjunction with the -e switch to produce unedited output that arXivBib can re-read on subsequent runs to format BibTeX-readable entries without re-querying arxiv.org for the same information.

For example,
        nohup ./arxivbib   -e   -t   < inputfile > arxivoutputfile &
produces arxivoutputfile containing unedited arxiv.org abstract data, merged with any optional fields you provided, in the form (for the same quant-ph/9812037 illustrated above)

quant-ph/9812037 \\
    cite = Aha:99b \\
    author = Dorit Aharonov \\
    month = December \\
    note = 77 pages, figures included in the ps file. To appear in: \
      Annual Reviews of Computational Physics, ed. Dietrich Stauffer, \
      World Scientific, vol VI, 1998. The paper can be down loaded \
      also from this http URL \\
    title = Quantum Computation \\
    year = 1998 \\
    abstract = In the last few years, theoretical study of quantum \
      systems serving as computational devices has achieved tremendous \
      progress. ...etc... In the end of this review I make these \
      connections explicit, discussing the possible implications of \
      quantum computation on fundamental physical questions, such as \
      the transition from quantum to classical physics. \\
    date = Tue, 15 Dec 1998 \\
    eprint = quant-ph/9812037 

This arxivoutputfile file produced as output by the first run of arXivBib can now be used as input to a second run
        ./arxivbib   -x   < arxivoutputfile > finaloutputfile
that produces finaloutputfile containing edited default -t2 BibTeX-formatted entries. Particularly note the -x switch on this second run. It completely turns off all arxiv.org abstract retrievals, and all the accompanying 15-second delays (nohup...& isn't necessary with -x). So this second runs only uses data from the input file. And that's okay since this file already contains the fields derived from arxiv.org abstracts.

Thus, this finaloutputfile is byte-for-byte identical to the BibTeX output described above. So why use a more complicated two-step process to obtain exactly the same result? Several reasons:

Building arXivBib  


Very quickly   ---   download arxivbib.zip to any Unix/Linux box and then type
unzip arxivbib.zip
cc arxivbib.c -o arxivbib
Read the rest of this section only if you want more detailed information.

I've built and run arXivBib under Linux and NetBSD using gcc. The source code is ansi-standard C, and should compile and run under any Unix-like environment without change. During execution, arXivBib issues system( ) commands of the form
            lynx -dump http://arxiv.org/abs/quant-ph/0506082 > tempfile
so the lynx browser must be on your path.

The steps needed to download and compile arXivBib are

That's all there is to compiling arXivBib. You may also optionally include the following -D switches on the compile line, whose functionality is as follows...

-DCOMMAND=\"lynx -dump http://arxiv.org/abs/\"
Arxiv.org abstracts are obtained by issuing the above command, with your input like quant-ph/9812037 tacked on. You can modify this default command to include any needed path prefix, etc.
-DMAXFLDLEN=990
As mentioned above, BibTeX appears to have a 1000-character maximum field length. Any field (the abstract field in particluar) longer than MAXFLDLEN is split across several fields. These continuation fields are named with 12, etc tacked onto the original field name, e.g., abstract, abstract1, etc.
-DUNPUBLISHED=\"unpublished\"
By default, an @article entry-type is formatted for abstracts from arxiv.org that contain a Journal-ref: field. Otherwise, an @unpublished entry-type is formatted which you can change to any entry-type you prefer.
-DXFILE=\"/tmp/arxivbib.out\"
By default, arXivBib instructs lynx to -dump arxiv abstracts to the temporary file /tmp/arxivbib.out which you can change to any more convenient location.

For example,
        cc -DXFILE=\"tempfile\" -DMAXFLDLEN=9999 arxivbib.c -o arxivbib
compiles arxivbib so that it writes abstracts to a file called tempfile in the current working directory from which it was launched, and with a very large maximum field length (effectively turning off that option).

Running arXivBib from the Unix shell  

ArXivbib runs from the command line on Unix-compliant boxes. It sleeps 15 seconds between abstract retrievals, as recommended by mjf at www-admin@arXiv.org, to avoid tripping arXiv's robot detection. Because of this sleep delay, arXivBib should either be run using nohup, or as a daemon using the -d switch on its command line. In the simplest case, this would look either like
        nohup ./arxivbib < inputfile > outputfile &
or like
        ./arxivbib -d -f inputfile -o outputfile
Stdin and stdout cannot be redirected when run as a daemon, so use the provided -f and -o switches instead, as illustrated. The examples in this document all use the nohup...& form, but both forms are equivalent, and you can use whichever is more convenient (e.g., some systems don't permit ordinary users to run nohup).

In addition to the preceding -d and -f and -o, arXivBib provides various additional command-line switches (most of which have already been mentioned) as follows...

-d
Runs arXivBib as a Unix daemon. You must also use both -f and -o switches in conjunction with the -d switch.
-e
Turns off arXivBib's field-level edits.
-f   inputfile
Input will be read from inputfile rather than from stdin. If you like, you may use -f when running arXivBib nohup...&. You must use -f when running arXivBib as a daemon with the -d switch.
-o   outputfile
Output will be written to outputfile rather than to stdout. If you like, you may use -o when running arXivBib nohup...&. You must use -o when running arXivBib as a daemon with the -d switch.
-s 15
ArXivbib sleeps 15 seconds between abstract retrievals to avoid tripping arXiv's robot detection. But you can change the sleep delay to any other interval using the -s switch, followed by your interval in seconds.
-t
Format output in arXivBib's native input format, rather than as BibTeX entries.
-t   entry-type
Format an @entry-type BibTeX entry if the abstract contains a Journal-ref: field.
-u   entry-type
Format an @entry-type BibTeX entry if the abstract doesn't contain a Journal-ref: field.
-x
Don't retrieve abstracts from arxiv.org

GPL License  

"My grandfather once told me there are two kinds of people:
    Those who do the work and those who take the credit.
    He told me to try to be in the first group; there was much less competition.
"
Indira Gandhi, the late Prime Minister of India

arXivBib's copyright is registered by me with the US Copyright Office, and I hereby license it to you under the terms and conditions of the GPL. There is no official support of any kind whatsoever, and you use arXivBib entirely at your own risk, with no guarantee of any kind, in particular with no warranty of merchantability.

By using arXivBib, you warrant that you have read, understood and agreed to these terms and conditions, and that you possess the legal right and ability to enter into this agreement and to use arXivBib in accordance with it.

Hopefully, the law and ethics regarding computer programs will evolve to make this kind of obnoxious banter unnecessary. In the meantime, please forgive me my paranoia.

To protect your own intellectual property, I recommend Copyright Basics from The Library of Congress, and similarly, Copyright Basics from The American Bar Association. Very briefly, download Form TX and follow the included instructions. In principle, you automatically own the copyright to anything you write the moment it's on paper. In practice, if the matter comes under dispute, the courts look _very_ favorably on you for demonstrating your intent by registering the copyright.

Concluding Remarks  

I hope you find arXivBib useful. If so, a contribution to your country's TeX Users Group, or to the GNU project, is suggested.


Copyright © 2005-2006, John Forkosh Associates, Inc.
email: john@forkosh.com