By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 2.22; (* of medlinebib.p 2024 Feb 05}
(* begin module describe.medlinebib *)
(*
name
medlinebib: convert medline Unix query format to bibtex format
synopsis
medlinebib(query: in, medlinebibp: inout, bibformat: out, output: out)
files
query: The Medline format query file in Unix format created by Entrez.
bibformat: The reference in the query rendered in bibtex format.
The title is wrapped according to variable titlelinesize.
medlinebibp: parameters to control the program. The file must contain the
following parameters, one per line:
1. The version number of the program. This allows the user to be
warned if an old parameter file is used.
2. If the first character of the second line is 'd' then the program
runs in debugging mode. This means that it will show the parts
of the reference as it parses them.
3. If the first character of the third line is 'e' then the program
will create additional non-standard bibtex parts for the Medline
components. This will make a bulky entry, but it will contain
all of the medline data. Any cases of double quotes (") are
converted to single quotes to protect the bibtex file.
4. If the first character of the fourth line is 'f' then the program
use the final author to make the bibtex key. Otherwise
the second author is used (or none when there is only one author).
5. If the first character of the fifth line is 'd' then the program
will double dash page numbers: 1--5, otherwise it will single dash.
6. The title line size, titlelinesize. This is the number of
characters that the lines will be wrapped to.
7. If the first character is blank ' ', no date is written.
If the first character is 'f', then the full
date and time is written in the format "2012/04/19 19:26:08"
into a 'comment' just before the finaly year of a bibiliography
entry. This way one can automatically keep track of when one
found a particular paper.
8. If the query is missing some data such as volume or pages in
an early publication, then this line contains the string
give instead.
Suggestion:
1. Define this LaTeX command in your papers as:
\newcommand{\todo}{\rule{0.5em}{1ex}}
then use the string:
\todo
2. Define this LaTeX command in your papers as:
\newcommand{\todobf}[1]{{\rule{0.5em}{1ex}\textbf{ #1}}}
then use the string:
\todobf{MISSING}
Note: as of version 1.47, medlinebibp will be automatically upgraded
to include parameter 5 and any later parameters. This means that
medlinebib will read in the medlinbibp and write it out again.
output: messages to the user
description
Convert Medline format to bibtex format.
The program takes a medline format in file 'query' and creates a bibtex
file, 'bibformat'.
While you can go to the trouble of downloading the medline format,
I have revised the script (now called medquery) so that if one
saves a page directly from pubmed it will be automatically
converted. When one saves a page it comes out as a 'query.fcgi'
file (query.fcgi.html on my mac). The medquery script searches
through this and plucks out the PMID identifier. Then it reaches
across the internet using wget to obtain the medline format. The
medline format is then converted to bibtex. This all happens so
fast that the complexity doesn't matter.
Note: Spaces in names were originally converted to underscores for
the key, but LaTeX/BibTeX objects (since an underscore means
mathematics) so spaces are now (2011 Oct 05) converted to dashes.
New rule 2013 Sep 11: spaces removed to make cut/paste easier.
New rule 2014 Feb 25: ALL dashes are removed from the key names.
To use the program:
1. Set up atchange and wget on your computer.
2. Set up atchange running in your home directory on an 'automate'
file containing:
query.fcgi
medquery
/tmp/query.fcgi
echo moving query.fcgi to home for processing
mv /tmp/query.fcgi ~
2. Start at the PubMed web page
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
and retrieve a paper abstract.
3. Save the abstract. This will create a query.fcgi file in your
home directory.
2005 May 11 Note: Because pubmed keeps changing the format of
their save mechanism, just save the page directly using your
browser save mechanism. It may be called query.fcgi.html or
query.fcgi. The medquery script will extract the PMID (PubMed ID)
from the saved html page (using the name query.fcgi). Hopefully
this will be more stable, and it is certainly faster to save the
page directly.
4. Creating the query.fcgi file will trigger atchange to run the
medquery script, which converts from medline to BiBTeX format. The
bibformat file will appear in your home directory. Successive
references are also stored in a 'bib' file in your home directory.
The medlinebibp file is automatically created. (The medquery will
clean up after itself by putting the medline format file into
/tmp.)
5. Down in the directory where you keep your reference directory
you can have a pointer to the resulting bib file. Of course there
are other ways of automating this, but for me it makes the
conversion rather rapid. I just go to my reference directory, edit
my bib file and read in the new entry.
Note: medlinebib changes page numbers in the form 507-10 to the
form 507--510.
Note: To generate the BibTex key, medlinebib will convert spaces to
'-' in author names and removes single (') and double (") quotes
from the names. This allows one to use the names on the command
line without having to type the othe kind of quote mark. The names
of the authors in the reference are not affected.
As of 2014 Feb 19 the spaces will be removed.
6. PubMed now supplies the DOI link, so medlinebib creates a note:
note = "\url
{https://doi.org/10.1038/ncomms8486}",
To use this in LaTeX, one must \usepackage{html} in the preamble.
Note that the key for these lines in medline format is 'LID' or
'AID' and there can be several at the same time:
LID - 10.1016/j.jtbi.2015.01.042 [doi]
LID - S0022-5193(15)00059-4 [pii]
LID - arXiv:2301.00783v2
So the program needs to extract only the doi.
In the case of arXiv the format to use instead of a doi is:
https://arxiv.org/abs/2301.00783v2
or better:
https://doi.org/10.48550/arXiv.2301.00783
examples
Try searching for
Schneider TD
documentation
see also
Unix csh script: medquery
atchange is described at:
https://alum.mit.edu/www/toms/atchange.html
wget information:
https://alum.mit.edu/www/toms/wget.html
http://www.gnu.org/software/wget/
Sort the bibtex file alphabetically: sortbibtex.p
To find a reference quickly from the Year, Volume and Page,
you can identify it in PubMed using the yvp script:
https://alum.mit.edu/www/toms/yvp.html
---------
Pubmed link:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
PubMed Help:
http://www.ncbi.nlm.nih.gov/books/NBK3830/
Changes to PubMed are announced in the NLM Technical Bulletin:
https://www.nlm.nih.gov/pubs/techbull/tb.html
PubMed modification are also delineated in the PubMed New/Noteworthy
RSS Feed:
http://www.ncbi.nlm.nih.gov/feed/rss.cgi?ChanKey=PubMedNews
MEDLINE/PubMed Data Element (Field) Descriptions - full description:
http://www.nlm.nih.gov/bsd/mms/medlineelements.html
---------
Parameter file: medlinebibp
Belorussian translation of this page by Bohdan Zograf:
http://www.designcontest.com/show/medlinebib-program-be
author
Thomas Dana Schneider
bugs
********************************************************************************
If there are too many names, Entrez says "et al" for the
last name. This gets represented as:
and a. l. et",
Who is Al L. Et? :-)
It should be recognized and made:
and {\em et al}",
********************************************************************************
Authors with names like:
La Branche H
should be processed to "LaBranch".
The only way to recognize this is the small case letters in
the second part of the last name - rather subtle.
********************************************************************************
If you make the medlinebib program smart enough to re-format the
reference titles to less than 80 characters per line in the
output bibformat file, then the sortbibtex program will run flawlessly
using it as the input file. Otherwise, it gets hung on the title lines
that are greater than 80 chars/line.
********************************************************************************
1998 Jan 11
Bielinsky.Gerbi1998 is a case in which [In Process Citation] goes from one
line to the next; the program does not handle this yet
********************************************************************************
2000 Aug 17
The program does not fix page numbers if there is more material:
pages = "233-44; discussion 244-50",
2005 Nov 04. A special case of this occurs in Biotechniques because
they often have advertisements in the middle of the paper. For
example:
@article{Rong.McAllister1999,
author = "M. Rong
and R. Castagna
and W. T. McAllister",
title = "{Cloning and purification of bacteriophage K11 RNA polymerase}",
journal = "Biotechniques",
volume = "27",
pages = "690--2, 694",
pmid = "10524308",
year = "1999"}
Such cases are too complex for this pea brain program to handle so it
does nothing and thereby avoids messing up the page numbers. Note:
the original page number string at pubmed, '690-2, 694' is incorrect.
It removes 693 (which is an ad) but not 691 (which is also an ad).
********************************************************************************
technical notes
The entire title is surrounded by {} to protect capitalized words. (Done
1997 March 20)
Medline insists on inserting " [In Process Citation]" into the tile of new
partially completed (?) references. The program removes this string when
it is found at the end of the title. (Done 1997 June 14)
See bug note above.
1998 June 30: The program now handles Jr cases such as
AU - Kazazian HH Jr
by combining the Jr with the last name in the bibliography (as HH
{Kazazian Jr} and by dropping it from the keyname.
1999 Sep 5: I upgraded mq to the medquery script. This script uses wget
to grab the medline format. This means that you can get a pubmed
reference and just save it. Medquery doesn't care whether you save it as
mac, pc or unix, and it will get the medline format by wget. Then it
converts to bibtex format. So you only have to click on save twice - it's
much faster!
2000 July 27: The old medline linK
http://www4.ncbi.nlm.nih.gov/Entrez/medline.html
is no longer active. It produced a "query" file. This automatically
takes one to the new location:
http://www4.ncbi.nlm.nih.gov/entrez/query.fcgi
This produces a "query.fcgi" file.
2011 Oct 21: No author list can be legitimate, see: PubMed 2410266
http://www.ncbi.nlm.nih.gov/pubmed/2410266
Eur J Biochem. 1985 Jul 1;150(1):1-5.
Nomenclature Committee of the International Union of Biochemistry
(NC-IUB). Nomenclature for incompletely specified bases in nucleic
acid sequences. Recommendations 1984.
[No authors listed]
2013 Jun 13, 1.99: PMID 15903968 gives two cases of
Ingber DE. Previous use of IR for PMID 21051339 crashed.
google: medline IR
http://www.nlm.nih.gov/bsd/mms/medlineelements.html
MEDLINE\textsuperscript{\textregistered}/
PubMed\textsuperscript{\textregistered}
Data Element (Field) Descriptions
IR, FIR = Investigator Name and Full Investigator Name
Au, FAU = Author, Full Author
So IR is not appropriate for a reference.
BUT PMID 21051339 has no authors!
To keep cases with no authors, no change to code. There will be duplicates
when there is an AU and an IR for the same person. CODE COULD BE WRITTEN
TO LOOK FOR THIS using equalstring.
*)
(* end module describe.medlinebib *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}