How to Be A PubMed Historian

Neuroskeptic iconNeuroskepticBy NeuroskepticMay 18, 2010 3:05 PM


Sign up for our email newsletter for the latest science news

Quite a lot of people seem to like those graphs I sometimes make showing the number of papers published about a certain topic in any given year, based on the number of PubMed hits.

But how do I do it? Surely I don't sit there manually searching PubMed for each term, for each year, right? That would mean dozens, maybe hundreds, of manual searches. Well, unfortunately, that is exactly how I've done it in the past. I really am that cool, see.

Actually it doesn't take verylong once you get into the swing of it, but I've now worked out a better way. See below for a


script which repeatedly searches PubMed for a given sequence of years, downloads the first page of the results, picks out the bit where it tells you how many hits you got, and puts it all into a single output text file ready to be pasted into Excel or whatever. This comes with no guarantees whatsoever, but it seems to work. Enjoy...

Edit 29/06/2010: Vastly improved version that searches for multiple different terms sequentially, accepts terms that include spaces, and outputs the data into a sensible format

. The search term text file should be a plain text file containing one search term per line. e.g:

serotonin depressiondopamine depressionGABA depression

Would search for each of those terms and output the data for each year into a single text file - with three data columns in this case - good for comparing the relative popularity of many different terms across time.


#! /bin/bash# 29 . 06 . 2010#PubMedHistory script by Neuroskeptic http://neuroskeptic.blogspot.com# script to find out how many PubMed hits for a certain string in a given year range.

# usage: script (search term text file) (start year) (end year) (output file)# e.g script list_of_terms.txt 2000 2005 dope.txt#first, print the HEADER line of the output file.

printf "YEARt" > $4cat $1 | while read subjectdo#pre-format the subject to remove spacesffa=${subject/' '/%20}echo -n "$ffa" >> $4printf "t" >> $4done#and a newlineprintf "n" >> $4

#Now the real thing. The main loop is a YEAR loop:

for (( yearz=$2; yearz<=$3; yearz++ )) do #For each year, create a temporary file t.txt containing the output for this line.#First, the year, then a tab.

printf "$yearzt" > t.txt

#now, a second loop to go through the list of searchescat $1 | while read subjectdoone=${subject/' '/%20}wget -O $yearz.txt http://www.ncbi.nlm.nih.gov/sites/entrez?term="$one"+"$yearz"'[Publication Date]'

#find the line in the output with what we're interested inoutput=`cat $yearz.txt | grep ncbi_resultcount | awk '{print}'`#now, change it to get rid of the bit containing the search term#as this will screw up the next step if it contains spaces!output=${output/content*

publication/LOL}#print to a temp fileecho $output > temp$one$2$3$4.txt#find the bit we want using awkoutput=`awk '{ print $22 }' temp$one$2$3$4.txt`rm temp$one$2$3$4.txtrm $yearz.txt#trim outputtrimmedout=${output#content=

"}trimmedoutB=${trimmedout%"}#replace "false" with 0 because that's what "false" meanstrimmedoutC=${trimmedoutB/'

false'/0}echo in year $yearz , I got $trimmedoutC. Saving to temp file t.txt#write the result, and a tab, to the TEMPORARY output fileprintf "$trimmedoutCt" >> t.txtdone#Now we've done all the search terms for this YEAR, so send the temporary data to the final filecat t.txt >> $4#and give it a newlineprintf "n" >> $4donerm t.txt

1 free article left
Want More? Get unlimited access for as low as $1.99/month
Already a subscriber? Log In or Register
1 free articleSubscribe
Want unlimited access?

Subscribe today and save 70%


Already a subscriber? Log In or Register
More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Our List

Sign up for our weekly science updates.

To The Magazine

Save up to 70% off the cover price when you subscribe to Discover magazine.

Copyright © 2021 Kalmbach Media Co.