Here's a simple little script to query PubMed for a Digitial Object Identifier (a DOI)

Usage is quite simple, find a DOI somewhere, e.g. 10.1038/nature02029 (for this groundbreaking paper), and run this:

CODE:
  1. lurch:~ python pythonquery.py 10.1038/nature02029

... and via the magic of webservices and XML, and with a bit of luck, you'll get something like this back:

CODE:
  1. Language-tree divergence times support the Anatolian theory of Indo-European origin.
  2.  
  3. Gray, RD, Atkinson, QD
  4. Nature 2003, 426 (6965):435-9
  5.  
  6. Languages, like genes, provide vital clues about human history. The origin of
  7. the Indo-European language family is "the most intensively studied, yet still
  8. most recalcitrant, problem of historical linguistics". Numerous genetic studies
  9. of Indo-European origins have also produced inconclusive results. Here we
  10. analyse linguistic data using computational methods derived from evolutionary
  11. biology. We test two theories of Indo-European origin: the 'Kurgan expansion'
  12. and the 'Anatolian farming' hypotheses. The Kurgan theory centres on possible
  13. archaeological evidence for an expansion into Europe and the Near East by
  14. Kurgan horsemen beginning in the sixth millennium BP. In contrast, the Anatolian
  15. theory claims that Indo-European languages expanded with the spread of
  16. agriculture from Anatolia around 8,000-9,500 years bp. In striking agreement
  17. with the Anatolian hypothesis, our analysis of a matrix of 87 languages with
  18. 2,449 lexical items produced an estimated age range for the initial Indo-European
  19. divergence of between 7,800 and 9,800 years bp. These results were robust to
  20. changes in coding procedures, calibration points, rooting of the trees and priors
  21. in the bayesian analysis.

The Code:

PYTHON:
  1. #!/usr/bin/env python
  2.  
  3. #   Simple script to query pubmed for a DOI
  4. #   (c) Simon Greenhill, 2007
  5. #   http://simon.net.nz/
  6.  
  7. import urllib
  8. from xml.dom import minidom
  9.  
  10. def get_citation_from_doi(query, email='YOUR EMAIL GOES HERE', tool='SimonsPythonQuery', database='pubmed'):
  11. params = {
  12. 'db':database,
  13. 'tool':tool,
  14. 'email':email,
  15. 'term':query,
  16. 'usehistory':'y',
  17. 'retmax':1
  18. }
  19. # try to resolve the PubMed ID of the DOI
  20. url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?' + urllib.urlencode(params)
  21. data = urllib.urlopen(url).read()
  22.  
  23. # parse XML output from PubMed...
  24. xmldoc = minidom.parseString(data)
  25. ids = xmldoc.getElementsByTagName('Id')
  26.  
  27. # nothing found, exit
  28. if len(ids) == 0:
  29. raise "DoiNotFound"
  30.  
  31. # get ID
  32. id = ids[0].childNodes[0].data
  33.  
  34. # remove unwanted parameters
  35. params.pop('term')
  36. params.pop('usehistory')
  37. params.pop('retmax')
  38. # and add new ones...
  39. params['id'] = id
  40.  
  41. params['retmode'] = 'xml'
  42.  
  43. # get citation info:
  44. url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?' + urllib.urlencode(params)
  45. data = urllib.urlopen(url).read()
  46.  
  47. return data
  48.  
  49. def text_output(xml):
  50. """Makes a simple text output from the XML returned from efetch"""
  51.  
  52. xmldoc = minidom.parseString(xml)
  53.  
  54. title = xmldoc.getElementsByTagName('ArticleTitle')[0]
  55. title = title.childNodes[0].data
  56.  
  57. abstract = xmldoc.getElementsByTagName('AbstractText')[0]
  58. abstract = abstract.childNodes[0].data
  59.  
  60. authors = xmldoc.getElementsByTagName('AuthorList')[0]
  61. authors = authors.getElementsByTagName('Author')
  62. authorlist = []
  63. for author in authors:
  64. LastName = author.getElementsByTagName('LastName')[0].childNodes[0].data
  65. Initials = author.getElementsByTagName('Initials')[0].childNodes[0].data
  66. author = '%s, %s' % (LastName, Initials)
  67. authorlist.append(author)
  68.  
  69. journalinfo = xmldoc.getElementsByTagName('Journal')[0]
  70. journal = journalinfo.getElementsByTagName('Title')[0].childNodes[0].data
  71. journalinfo = journalinfo.getElementsByTagName('JournalIssue')[0]
  72. volume = journalinfo.getElementsByTagName('Volume')[0].childNodes[0].data
  73. issue = journalinfo.getElementsByTagName('Issue')[0].childNodes[0].data
  74. year = journalinfo.getElementsByTagName('Year')[0].childNodes[0].data
  75.  
  76. # this is a bit odd?
  77. pages = xmldoc.getElementsByTagName('MedlinePgn')[0].childNodes[0].data
  78.  
  79. output = []
  80. output.append(title)
  81. output.append('') #empty line
  82. output.append(', '.join(authorlist))
  83. output.append( '%s %s, %s (%s):%s' % (journal, year, volume, issue, pages) )
  84. output.append('') #empty line
  85. output.append(abstract)
  86. return output
  87.  
  88. if __name__ == '__main__':
  89. from sys import argv, exit
  90. if len(argv) == 1:
  91. print 'Usage: %s <query>' % argv[0]
  92. print ' e.g. %s 10.1038/ng1946' % argv[0]
  93. exit()</query>
  94.  
  95. citation = get_citation_from_doi(argv[1])
  96. for line in text_output(citation):
  97. print line

--Simon