PubMed 文献データの取得

BioPython の Entrez モジュールを利用することで、PubMed の文献データを検索したり、取得したりすることが可能である。

PubMed データの取得

PubMed ID を与えて、著者、タイトル、ジャーナル名などを取得する Python スクリプト。論文の著者やタイトルなどに ASCII 文字以外の文字(á, ê, ç)などの文字が含まれる場合があるので、encode メソッドで一度変換しておく。

from Bio import Entrez


def get_ref(pubmed_id):
    Entrez.email = 'your-email@address.com'
    fet = Entrez.efetch(db = 'pubmed', retmode = 'xml', id = pubmed_id)
    dat = Entrez.read(fet)
    dat_article = dat[0]['MedlineCitation']['Article']

    title = ''
    journal = ''
    volume = ''
    issue = ''
    pages = ''
    date = ''

    try:
        title = dat_article['ArticleTitle']
    except:
        pass
    try:
        journal_name = dat_article['Journal']['ISOAbbreviation']
    except:
        pass
    try:
        volume = dat_article['Journal']['JournalIssue']['Volume']
    except:
        pass
    try:
        issue = '(' + dat_article['Journal']['JournalIssue']['Issue'] + ')'
    except:
        pass
    try:
        pages = ':' + dat_article['Pagination']['MedlinePgn']
    except:
        pass
    try:
        date = dat_article['Journal']['JournalIssue']['PubDate']['Year']
    except:
        pass

    authors = []
    for au in dat_article['AuthorList']:
        authors.append(au['LastName'] + ' ' + au['Initials'])
    authors = ', '.join(authors)

    ref_field = authors + '. ' + title + ' ' + journal_name + ' ' + \
                date + ', ' + volume + issue + pages + ' [PMID: ' + pubmed_id + ']'
    ref_field = ref_field.encode('utf-8')
    
    return ref_field

if __name__ == '__main__':
    ref_field = []
    ref_field.append(get_ref('23104886'))
    ref_field.append(get_ref('23837715'))
    print('\n'.join(ref_field))

出力サンプル。

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29(1):15-21 [PMID: 23104886]
Sun J, Nishiyama T, Shimizu K, Kadota K. TCC: an R package for comparing tag count data with robust normalization strategies. BMC Bioinformatics 2013, 14:219 [PMID: 23837715]