How to Be A PubMed Historian

Neuroskeptic iconNeuroskeptic
By Neuroskeptic
May 18, 2010 8:05 PMNov 5, 2019 12:19 AM

Newsletter

Sign up for our email newsletter for the latest science news
 

Quite a lot of people seem to like those graphs I sometimes make showing the number of papers published about a certain topic in any given year, based on the number of PubMed hits.

But how do I do it? Surely I don't sit there manually searching PubMed for each term, for each year, right? That would mean dozens, maybe hundreds, of manual searches. Well, unfortunately, that is exactly how I've done it in the past. I really am that cool, see.

Actually it doesn't take verylong once you get into the swing of it, but I've now worked out a better way. See below for a

bash

script which repeatedly searches PubMed for a given sequence of years, downloads the first page of the results, picks out the bit where it tells you how many hits you got, and puts it all into a single output text file ready to be pasted into Excel or whatever. This comes with no guarantees whatsoever, but it seems to work. Enjoy...

Edit 29/06/2010: Vastly improved version that searches for multiple different terms sequentially, accepts terms that include spaces, and outputs the data into a sensible format

. The search term text file should be a plain text file containing one search term per line. e.g:

serotonin depressiondopamine depressionGABA depression

Would search for each of those terms and output the data for each year into a single text file - with three data columns in this case - good for comparing the relative popularity of many different terms across time.

---

#! /bin/bash# 29 . 06 . 2010#PubMedHistory script by Neuroskeptic http://neuroskeptic.blogspot.com# script to find out how many PubMed hits for a certain string in a given year range.

# usage: script (search term text file) (start year) (end year) (output file)# e.g script list_of_terms.txt 2000 2005 dope.txt#first, print the HEADER line of the output file.

printf "YEARt" > $4cat $1 | while read subjectdo#pre-format the subject to remove spacesffa=${subject/' '/%20}echo -n "$ffa" >> $4printf "t" >> $4done#and a newlineprintf "n" >> $4

#Now the real thing. The main loop is a YEAR loop:

for (( yearz=$2; yearz<=$3; yearz++ )) do #For each year, create a temporary file t.txt containing the output for this line.#First, the year, then a tab.

printf "$yearzt" > t.txt

#now, a second loop to go through the list of searchescat $1 | while read subjectdoone=${subject/' '/%20}wget -O $yearz.txt http://www.ncbi.nlm.nih.gov/sites/entrez?term="$one"+"$yearz"'[Publication Date]'

#find the line in the output with what we're interested inoutput=`cat $yearz.txt | grep ncbi_resultcount | awk '{print}'`#now, change it to get rid of the bit containing the search term#as this will screw up the next step if it contains spaces!output=${output/content*

publication/LOL}#print to a temp fileecho $output > temp$one$2$3$4.txt#find the bit we want using awkoutput=`awk '{ print $22 }' temp$one$2$3$4.txt`rm temp$one$2$3$4.txtrm $yearz.txt#trim outputtrimmedout=${output#content=

"}trimmedoutB=${trimmedout%"}#replace "false" with 0 because that's what "false" meanstrimmedoutC=${trimmedoutB/'

false'/0}echo in year $yearz , I got $trimmedoutC. Saving to temp file t.txt#write the result, and a tab, to the TEMPORARY output fileprintf "$trimmedoutCt" >> t.txtdone#Now we've done all the search terms for this YEAR, so send the temporary data to the final filecat t.txt >> $4#and give it a newlineprintf "n" >> $4donerm t.txt

1 free article left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

1 free articleSubscribe
Discover Magazine Logo
Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In

More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Join
Our List

Sign up for our weekly science updates.

 
Subscribe
To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Copyright © 2024 Kalmbach Media Co.