Delila Program: medlinebib
medlinebib program

By downloading this code you agree to the
Source Code Use License (PDF).

Pascal source code: medlinebib.p (wget instructions)
Instructions on compiling
MacOS binary: medlinebib
Alphabetic List of Delila Programs
Delila Programs by Most Recent Update
Please report broken links
delilabundle.zip = All Programs and MacOS Binaries
Copyright Statement for Delila Programs

Documentation for the medlinebib program is below, with links to related programs in the "see also" section.

{   version = 2.22; (* of medlinebib.p 2024 Feb 05}

(* begin module describe.medlinebib *)
(*
name
   medlinebib: convert medline Unix query format to bibtex format

synopsis
   medlinebib(query: in, medlinebibp: inout, bibformat: out, output: out)

files
   query:  The Medline format query file in Unix format created by Entrez.

   bibformat:  The reference in the query rendered in bibtex format.
      The title is wrapped according to variable titlelinesize.

   medlinebibp:  parameters to control the program.  The file must contain the
        following parameters, one per line:

      1. The version number of the program.  This allows the user to be
         warned if an old parameter file is used.

      2. If the first character of the second line is 'd' then the program
         runs in debugging mode.  This means that it will show the parts
         of the reference as it parses them.

      3. If the first character of the third line is 'e' then the program
         will create additional non-standard bibtex parts for the Medline
         components.  This will make a bulky entry, but it will contain
         all of the medline data.  Any cases of double quotes (") are
         converted to single quotes to protect the bibtex file.

      4. If the first character of the fourth line is 'f' then the program
         use the final author to make the bibtex key.  Otherwise
         the second author is used (or none when there is only one author).

      5. If the first character of the fifth line is 'd' then the program
         will double dash page numbers: 1--5, otherwise it will single dash.

      6. The title line size, titlelinesize.  This is the number of
         characters that the lines will be wrapped to.

      7. If the first character is blank ' ', no date is written.
         If the first character is       'f', then the full
         date and time is written in the format "2012/04/19 19:26:08"
         into a 'comment' just before the finaly year of a bibiliography
         entry.  This way one can automatically keep track of when one
         found a particular paper.

      8. If the query is missing some data such as volume or pages in
         an early publication, then this line contains the string
         give instead.

         Suggestion:

         1. Define this LaTeX command in your papers as:
         \newcommand{\todo}{\rule{0.5em}{1ex}}
         then use the string:
         \todo

         2. Define this LaTeX command in your papers as:
         \newcommand{\todobf}[1]{{\rule{0.5em}{1ex}\textbf{ #1}}}
         then use the string:
         \todobf{MISSING}

      Note:  as of version 1.47, medlinebibp will be automatically upgraded
      to include parameter 5 and any later parameters.  This means that
      medlinebib will read in the medlinbibp and write it out again.

   output: messages to the user

description

   Convert Medline format to bibtex format.

   The program takes a medline format in file 'query' and creates a bibtex
   file, 'bibformat'.

   While you can go to the trouble of downloading the medline format,
   I have revised the script (now called medquery) so that if one
   saves a page directly from pubmed it will be automatically
   converted.  When one saves a page it comes out as a 'query.fcgi'
   file (query.fcgi.html on my mac).  The medquery script searches
   through this and plucks out the PMID identifier.  Then it reaches
   across the internet using wget to obtain the medline format.  The
   medline format is then converted to bibtex.  This all happens so
   fast that the complexity doesn't matter.

   Note:  Spaces in names were originally converted to underscores for
   the key, but LaTeX/BibTeX objects (since an underscore means
   mathematics) so spaces are now (2011 Oct 05) converted to dashes.
   New rule 2013 Sep 11: spaces removed to make cut/paste easier.
   New rule 2014 Feb 25: ALL dashes are removed from the key names.

   To use the program:

   1. Set up atchange and wget on your computer.

   2. Set up atchange running in your home directory on an 'automate'
      file containing:

query.fcgi
  medquery

/tmp/query.fcgi
 echo moving query.fcgi to home for processing
 mv /tmp/query.fcgi ~

   2. Start at the PubMed web page

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi

   and retrieve a paper abstract.

   3.  Save the abstract.  This will create a query.fcgi file in your
   home directory.

   2005 May 11 Note:  Because pubmed keeps changing the format of
   their save mechanism, just save the page directly using your
   browser save mechanism.  It may be called query.fcgi.html or
   query.fcgi.  The medquery script will extract the PMID (PubMed ID)
   from the saved html page (using the name query.fcgi).  Hopefully
   this will be more stable, and it is certainly faster to save the
   page directly.

   4.  Creating the query.fcgi file will trigger atchange to run the
   medquery script, which converts from medline to BiBTeX format.  The
   bibformat file will appear in your home directory.  Successive
   references are also stored in a 'bib' file in your home directory.
   The medlinebibp file is automatically created.  (The medquery will
   clean up after itself by putting the medline format file into
   /tmp.)

   5.  Down in the directory where you keep your reference directory
   you can have a pointer to the resulting bib file.  Of course there
   are other ways of automating this, but for me it makes the
   conversion rather rapid.  I just go to my reference directory, edit
   my bib file and read in the new entry.

   Note: medlinebib changes page numbers in the form 507-10 to the
   form 507--510.

   Note: To generate the BibTex key, medlinebib will convert spaces to
   '-' in author names and removes single (') and double (") quotes
   from the names.  This allows one to use the names on the command
   line without having to type the othe kind of quote mark.  The names
   of the authors in the reference are not affected.
   As of 2014 Feb 19 the spaces will be removed.

   6.  PubMed now supplies the DOI link, so medlinebib creates a note:

   note = "\url
   {https://doi.org/10.1038/ncomms8486}",

   To use this in LaTeX, one must \usepackage{html} in the preamble.

   Note that the key for these lines in medline format is 'LID' or
   'AID' and there can be several at the same time:

LID - 10.1016/j.jtbi.2015.01.042 [doi]
LID - S0022-5193(15)00059-4 [pii]
LID - arXiv:2301.00783v2

   So the program needs to extract only the doi.

   In the case of arXiv the format to use instead of a doi is:
https://arxiv.org/abs/2301.00783v2
or better:
https://doi.org/10.48550/arXiv.2301.00783

examples

   Try searching for

      Schneider TD

documentation

see also

   Unix csh script: medquery

   atchange is described at:
   https://alum.mit.edu/www/toms/atchange.html

   wget information:
   https://alum.mit.edu/www/toms/wget.html
   http://www.gnu.org/software/wget/

   Sort the bibtex file alphabetically: sortbibtex.p

   To find a reference quickly from the Year, Volume and Page,
   you can identify it in PubMed using the yvp script:
   https://alum.mit.edu/www/toms/yvp.html

   ---------

   Pubmed link:
   http://www.ncbi.nlm.nih.gov/entrez/query.fcgi

   PubMed Help:
   http://www.ncbi.nlm.nih.gov/books/NBK3830/

   Changes to PubMed are announced in the NLM Technical Bulletin:
   https://www.nlm.nih.gov/pubs/techbull/tb.html

   PubMed modification are also delineated in the PubMed New/Noteworthy
    RSS Feed:
   http://www.ncbi.nlm.nih.gov/feed/rss.cgi?ChanKey=PubMedNews

   MEDLINE/PubMed Data Element (Field) Descriptions - full description:
   http://www.nlm.nih.gov/bsd/mms/medlineelements.html

   ---------

   Parameter file: medlinebibp

   Belorussian translation of this page by Bohdan Zograf:
   http://www.designcontest.com/show/medlinebib-program-be

author

   Thomas Dana Schneider

bugs

********************************************************************************
   If there are too many names, Entrez says "et al" for the
   last name.  This gets represented as:

 and a. l. et",

   Who is Al L. Et?  :-)

   It should be recognized and made:

 and {\em et al}",

********************************************************************************

 Authors with names like:

    La Branche H

 should be processed to "LaBranch".
 The only way to recognize this is the small case letters in
 the second part of the last name - rather subtle.

********************************************************************************

If you make the medlinebib program smart enough to re-format the
reference titles to less than 80 characters per line in the
output bibformat file, then the sortbibtex program will run flawlessly
using it as the input file. Otherwise, it gets hung on the title lines
that are greater than 80 chars/line.

********************************************************************************

   1998 Jan 11
   Bielinsky.Gerbi1998 is a case in which [In Process Citation] goes from one
   line to the next; the program does not handle this yet

********************************************************************************

2000 Aug 17

The program does not fix page numbers if there is more material:

pages = "233-44; discussion 244-50",

2005 Nov 04.  A special case of this occurs in Biotechniques because
they often have advertisements in the middle of the paper.  For
example:

@article{Rong.McAllister1999,
author = "M. Rong
 and R. Castagna
 and W. T. McAllister",
title = "{Cloning and purification of bacteriophage K11 RNA polymerase}",
journal = "Biotechniques",
volume = "27",
pages = "690--2, 694",
pmid = "10524308",
year = "1999"}

Such cases are too complex for this pea brain program to handle so it
does nothing and thereby avoids messing up the page numbers.  Note:
the original page number string at pubmed, '690-2, 694' is incorrect.
It removes 693 (which is an ad) but not 691 (which is also an ad).

********************************************************************************

technical notes

   The entire title is surrounded by {} to protect capitalized words.  (Done
   1997 March 20)

   Medline insists on inserting " [In Process Citation]" into the tile of new
   partially completed (?) references.  The program removes this string when
   it is found at the end of the title. (Done 1997 June 14)
   See bug note above.

   1998 June 30: The program now handles Jr cases such as
      AU  - Kazazian HH Jr
   by combining the Jr with the last name in the bibliography (as HH
   {Kazazian Jr} and by dropping it from the keyname.

   1999 Sep 5:  I upgraded mq to the medquery script.  This script uses wget
   to grab the medline format.  This means that you can get a pubmed
   reference and just save it.  Medquery doesn't care whether you save it as
   mac, pc or unix, and it will get the medline format by wget.  Then it
   converts to bibtex format.  So you only have to click on save twice - it's
   much faster!

   2000 July 27: The old medline linK

http://www4.ncbi.nlm.nih.gov/Entrez/medline.html

   is no longer active.  It produced a "query" file.  This automatically
   takes one to the new location:

http://www4.ncbi.nlm.nih.gov/entrez/query.fcgi

   This produces a "query.fcgi" file.

   2011 Oct 21:  No author list can be legitimate, see: PubMed 2410266
   http://www.ncbi.nlm.nih.gov/pubmed/2410266
   Eur J Biochem. 1985 Jul 1;150(1):1-5.
   Nomenclature Committee of the International Union of Biochemistry
   (NC-IUB). Nomenclature for incompletely specified bases in nucleic
   acid sequences. Recommendations 1984.
   [No authors listed]

2013 Jun 13, 1.99: PMID 15903968 gives two cases of
Ingber DE.  Previous use of IR for PMID 21051339 crashed.
google: medline IR
http://www.nlm.nih.gov/bsd/mms/medlineelements.html
MEDLINE\textsuperscript{\textregistered}/
PubMed\textsuperscript{\textregistered}
Data Element (Field) Descriptions
IR, FIR = Investigator Name and Full Investigator Name
Au, FAU = Author, Full Author
So IR is not appropriate for a reference.
BUT PMID 21051339 has no authors!
To keep cases with no authors, no change to code.  There will be duplicates
when there is an AU and an IR for the same person.  CODE COULD BE WRITTEN
TO LOOK FOR THIS using equalstring.

*)
(* end module describe.medlinebib *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}