Wednesday, September 24, 2008

Ruby script to get a batch of MARC records

At work we bought a lot of new books for our new library. Our libraries use aprogram called LibraryPro which can import MARC records to add books to the catalog. To save ourselves some time, I look on the web for a script to get MARC records from the Internet and found this excellent script from William Denton. Because I needed to get batches of records and modify the records gotten to follow our custom dewey decimal number (number/category/author), I modified his script to accomplish that and this is how it looks now:
#!/usr/local/bin/ruby -w

# Script to get a batch of MARC records
#
#original code taken from http://www.miskatonic.org/library/zmarc.html
# original programmer
# William Denton wtd@pobox.com
# April 2007
# modifications made by Juan Pablo Tarquino http://jptarqu.blogspot.com
# Released under the MIT License.

# Copyright (c) 2007  William Denton
# Copyright (c) 2008  Juan Pablo Tarquino
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.

# INSTALLATION NOTES
#
# Note: this script has been tested only in Linux. You may have to
#  do extra steps to make it work in Windows or Mac
#
# Install 'yaz' because ruby-zoom depens on it. In debian:
# # sudo apt-get install yaz
#
# Requires the ruby-zoom package and the ruby-marc gem.
#
# ruby-zoom: http://ruby-zoom.rubyforge.org/
#
# ruby-marc: http://www.textualize.com/ruby_marc
# You can install ruby-marc by running
# # gem install marc
#
# NOTE ruby-zoom installs its own marc.rb file that will conflict with
# ruby-marc! You will need to delete ruby-zoom's marc.rb or rename it
# for ruby-marc to work.
#
# On my system it was installed in
#   /usr/local/lib/ruby/site_ruby/1.8/marc.rb
# but you'll have to look for it wherever your system put it.

# USAGE
#
# Enter your ISBN numbers in a file called 'in_isbn' in the same
# folder as this script. One ISBN number per line. Then using the
# terminal change current directory to the folder that contains
# this script and then execute:
#
# ./zmarc.rb
#
# A new file (export_marc.txt) will be created with the MARC records
# found


require 'rubygems'
require 'zoom'
require 'marc'
require 'yaml'


# Given an ISBN and some Z39.50 information, return MARCXML.  Why
# MARCXML?  Because (now) the ruby-zoom module can't return a
# ruby-marc MARC object. It can, however, return MARCXML, which
# ruby-marc can grok, so we translate it into XML and then back.

class ZMarc

 OUT_FILE_NAME = 'export_marc.txt'
 ERR_FILE_NAME = 'err_marc.txt'
 def self.z3950query(isbn, host, port, db)
   begin
     ZOOM::Connection.open(host, port) do |conn|
       conn.database_name = db
       conn.preferred_record_syntax = 'MARC21'
       rset = conn.search("@attr 1=7 #{isbn}")
       return rset[0].xml
     end
   rescue Exception => e
     # puts e # Uncomment to see any server erorrs
     return nil
   end
 end

 def self.import_records(isbn_numbers)
  
   #writer = MARC::Writer.new('marc.dat')
   err_file = File.open(ERR_FILE_NAME,"w")
   marc_file = MARC::Writer.new(OUT_FILE_NAME)
   servers = [
     # Reorder these so that your preferred servers are first
     # North America
     ['z3950.loc.gov',              7090, 'Voyager'     ], # Library of Congress
     ['www.saclibrarycatalog.org',   210, 'INNOPAC'     ], # Sacramento Pub Lib
    
     ['sirsi.library.utoronto.ca',  2200, 'unicorn'     ], # U Toronto
     ['amicus.collectionscanada.ca', 210, 'NL'          ], # Lib & Archives Canada
     ['aleph.mcgill.ca',             210, 'MUSE'        ], # McGill
     # ['ualapp.library.ualberta.ca', 2200, 'unicorn',    ], # U Alberta
     ['portage.library.ubc.ca',     7090, 'voyager'     ], # UBC
     ['catnyp.nypl.org',             210, 'INNOPAC'     ], # New York Pub Lib
     ['library.mit.edu',            9909, 'mit01pub'    ], # MIT
     ['prodorbis.library.yale.edu', 7090, 'voyager'     ], # Yale
     ['catalog.princeton.edu',      7090, 'voyager'     ], # Princeton
     ['ipac.lib.uchicago.edu',       210, 'usmarc'      ], # Chicago
     ['library.bu.edu',              210, 'INNOPAC'     ], # Boston U
     ['voyager.wrlc.org',           7090, 'voyager'     ], # Wash Res Lib Consor
     ['catalog.lib.jhu.edu',         210, 'horizon'     ], # Johns Hopkins
     ['z3950.lib.umich.edu',         210, 'miu01_pub'   ], # U Michigan
     ['catalog.library.cornell.edu',7090, 'voyager'     ], # Cornell
     # UK and Ireland
     ['library.ucc.ie',              210, 'INNOPAC'     ], # U College Cork
     ['library.ox.ac.uk',            210, 'MAIN*BIBMAST'], # Oxford
     ['z3950.nls.uk',               7290, 'voyager'     ], # Scottish Nat Lib
     ['lib-15.lse.ac.uk',           7090, 'voyager'     ], # LSE
     ['libsys.lib.hull.ac.uk',       210, 'INNOPAC'     ], # Hull
     # Europe (non-English)
     ['sigma.nkp.cz',               9909, 'NKC'         ], # Nat Lib Czech R
     ['lib.mpib-berlin.mpg.de',     2020, 'opac'        ], # Max Planck Inst
     ['ubsun02.biblio.etc.tu-bs.de',2020, 'bac'         ], # Bibliotheken Berlins
     ['z3950.kb.dk',                2100, 'KGL01'       ], # Kongelige Bibliothek
     ['www.bne.es',                 2210, 'BIMO'        ], # Nat Lib Spain
     ['roble.unizar.es',             210, 'INNOPAC'     ], # U Zaragoza
     ['www.helmet.fi',               210, 'INNOPAC'     ], # Helsinki Lib
     ['carmin.sudoc.abes.fr',        210, 'ABES-Z39-PUBLIC' ], # France
     ['gofor.bibli.ens-cachan.fr', 21210, 'ADVANCE'     ], # French school
     ['gofor.bibli.ens-cachan.fr', 21210, 'MAIN*BIBMAST'], # French school
     ['isis.cilea.it',              2100, 'usmarc'      ], # U Brescia
     ['aleph.library.tudelft.nl',   9909, 'tud01'       ], # # Techn U Delft
     ['z3950.bibsys.no',            2100, 'BIBSYS'      ], # Nat Lib Norway
     ['z3950.nb.no',                2100, 'norbok'      ], # Nat Lib Norway
     ['alpha.bn.org.pl',             210, 'INNOPAC'     ], # Nat Lib Poland
     ['z3950.btj.se',                210, 'BURK'        ], # Sweden
     # ['lbsihol.unimaas.nl',         7190, 'lbs'         ], # U Maastricht
     # Australia and New Zealand
     ['catalogue.nla.gov.au',       7090, 'voyager'     ], # Nat Lib Australia
     ['nlnzcat.natlib.govt.nz',     7190, 'voyager'     ], # Nat Lib New Zealand
     # Asia
     ['library.cuhk.edu.hk',         210, 'INNOPAC'     ], # Chinesse U HK
     ['linc.nus.edu.sg',             210, 'INNOPAC'     ], # Nat U Singapore
     ['nbinet.ncl.edu.tw',           210, 'INNOPAC'     ], # Nat Cent Lib Taiwan
     # ['wine.wul.waseda.ac.jp',       210, 'INNOPAC'     ], # Waseda U
     # Africa
     ['explore.up.ac.za',            210, 'INNOPAC'     ], # U Pretoria
     # ['natlib1.unisa.ac.za',         210, 'INNOPAC'     ], # Nat Lib South Africa

   ]

   total = 0
   #isbn_numbers = "978-0-545-05471-3,0-8037-2842-5,978-0-7642-0184-4,978-0-7586-1270-0,978-1-883551-45-2, 0-7847-1512-2, 978-1-5914-5447-2, 978-0-590-29972-5,0-439-81111-2,978-0-545-01162-4".gsub('-','').split(',')
   for isbn in isbn_numbers
    
     found = false
     # isbn = "978-1-883551-45-2"
     isbn = isbn.gsub(/[^0-9X]/, '')

     if (! /(978)*\d{9}[0-9X]/.match(isbn))
       puts "This is not a valid ISBN #{isbn}" # Not a true validity check!
     else

       # Two lists of open Z39.50 servers:
       #   http://targettest.indexdata.com/
       #   http://staff.library.mun.ca/staff/toolbox/z3950hosts.htm

   

       # Now the real business.  Loop through all the servers listed above
       # and query it about the ISBN until one answers or we run out of servers

       servers.each do |server|
         marcxml = z3950query(isbn, server[0], server[1], server[2])
         unless marcxml.nil?
           reader = MARC::XMLReader.new(StringIO.new(marcxml))
           new_record = MARC::Record.new()
           reader.each do |record|
             # Would be good to have an option or something so that people
             # wouldn't have to see the leader and other early fields and
             # possibly less interesting fields such as 9xx (local information).
             # Some libraries have lots of 852 (holdings) fields which
             # fill up the screen.
             #puts record.to_yaml
             puts "#{server[0]} ..."
             found = true
          
             if record['100'].nil?
               author_name = ' '*3
             else
               author_name = "#{record['100']['a']}   "[0..2]
             end
             puts author_name
      
             unless record['082'].nil?
      
               dewey_decimal = record['082']['a'].to_s
               #check if it already contains the 3 parts, add them if missing
               dewey_number_parts = dewey_decimal.split('/')
               if dewey_number_parts[1] == nil
                 dewey_number_parts[1] = 'EFic'
               end
               if dewey_number_parts[2] == nil
                 dewey_number_parts[2] = author_name
               end
    
               new_dewey_decimal = dewey_number_parts.join('/')
               puts new_dewey_decimal
               #          new_data_field = MARC::DataField.new('082','0','0',
               #            ['a', new_dewey_decimal],['2', record['082']['2'].to_s])

               #          record.append(new_data_field)
               #marc_raw_data = marc_raw_data.gsub(dewey_decimal, new_dewey_decimal)
             end
      
             #add fields to new_rcord
             record.each do |field|
               if field.tag == '082'
                 new_data_field = MARC::DataField.new('082','0','0',
                   ['a', new_dewey_decimal],['2', field['2'].to_s])

                 new_record.append(new_data_field)
               else
                 new_record.append(field)
       
               end
             end
             marc_file.write new_record
           end
           total = total + 1
           puts total
           break
         end
       end
       #puts "Sorry, nothing found for #{isbn}"
       err_file.puts isbn unless found
     end
   end
   marc_file.close
   err_file.close
 end
end

isbn_numbers = IO.read("in_isbn").split("\n")
ZMarc.import_records(isbn_numbers)
Feel free to use this script. It will only work with MRI ruby because of native extensions (ruby-zoom). Stay tune to this blog for a jruby script that gets the marc records from the Library of Congress's website.

0 comments: