Skip to main content

Ruby script to get a batch of MARC records

At work we bought a lot of new books for our new library. Our libraries use aprogram called LibraryPro which can import MARC records to add books to the catalog. To save ourselves some time, I look on the web for a script to get MARC records from the Internet and found this excellent script from William Denton. Because I needed to get batches of records and modify the records gotten to follow our custom dewey decimal number (number/category/author), I modified his script to accomplish that and this is how it looks now:
#!/usr/local/bin/ruby -w

# Script to get a batch of MARC records
#
#original code taken from http://www.miskatonic.org/library/zmarc.html
# original programmer
# William Denton wtd@pobox.com
# April 2007
# modifications made by Juan Pablo Tarquino http://jptarqu.blogspot.com
# Released under the MIT License.

# Copyright (c) 2007  William Denton
# Copyright (c) 2008  Juan Pablo Tarquino
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.

# INSTALLATION NOTES
#
# Note: this script has been tested only in Linux. You may have to
#  do extra steps to make it work in Windows or Mac
#
# Install 'yaz' because ruby-zoom depens on it. In debian:
# # sudo apt-get install yaz
#
# Requires the ruby-zoom package and the ruby-marc gem.
#
# ruby-zoom: http://ruby-zoom.rubyforge.org/
#
# ruby-marc: http://www.textualize.com/ruby_marc
# You can install ruby-marc by running
# # gem install marc
#
# NOTE ruby-zoom installs its own marc.rb file that will conflict with
# ruby-marc! You will need to delete ruby-zoom's marc.rb or rename it
# for ruby-marc to work.
#
# On my system it was installed in
#   /usr/local/lib/ruby/site_ruby/1.8/marc.rb
# but you'll have to look for it wherever your system put it.

# USAGE
#
# Enter your ISBN numbers in a file called 'in_isbn' in the same
# folder as this script. One ISBN number per line. Then using the
# terminal change current directory to the folder that contains
# this script and then execute:
#
# ./zmarc.rb
#
# A new file (export_marc.txt) will be created with the MARC records
# found


require 'rubygems'
require 'zoom'
require 'marc'
require 'yaml'


# Given an ISBN and some Z39.50 information, return MARCXML.  Why
# MARCXML?  Because (now) the ruby-zoom module can't return a
# ruby-marc MARC object. It can, however, return MARCXML, which
# ruby-marc can grok, so we translate it into XML and then back.

class ZMarc

 OUT_FILE_NAME = 'export_marc.txt'
 ERR_FILE_NAME = 'err_marc.txt'
 def self.z3950query(isbn, host, port, db)
   begin
     ZOOM::Connection.open(host, port) do |conn|
       conn.database_name = db
       conn.preferred_record_syntax = 'MARC21'
       rset = conn.search("@attr 1=7 #{isbn}")
       return rset[0].xml
     end
   rescue Exception => e
     # puts e # Uncomment to see any server erorrs
     return nil
   end
 end

 def self.import_records(isbn_numbers)
  
   #writer = MARC::Writer.new('marc.dat')
   err_file = File.open(ERR_FILE_NAME,"w")
   marc_file = MARC::Writer.new(OUT_FILE_NAME)
   servers = [
     # Reorder these so that your preferred servers are first
     # North America
     ['z3950.loc.gov',              7090, 'Voyager'     ], # Library of Congress
     ['www.saclibrarycatalog.org',   210, 'INNOPAC'     ], # Sacramento Pub Lib
    
     ['sirsi.library.utoronto.ca',  2200, 'unicorn'     ], # U Toronto
     ['amicus.collectionscanada.ca', 210, 'NL'          ], # Lib & Archives Canada
     ['aleph.mcgill.ca',             210, 'MUSE'        ], # McGill
     # ['ualapp.library.ualberta.ca', 2200, 'unicorn',    ], # U Alberta
     ['portage.library.ubc.ca',     7090, 'voyager'     ], # UBC
     ['catnyp.nypl.org',             210, 'INNOPAC'     ], # New York Pub Lib
     ['library.mit.edu',            9909, 'mit01pub'    ], # MIT
     ['prodorbis.library.yale.edu', 7090, 'voyager'     ], # Yale
     ['catalog.princeton.edu',      7090, 'voyager'     ], # Princeton
     ['ipac.lib.uchicago.edu',       210, 'usmarc'      ], # Chicago
     ['library.bu.edu',              210, 'INNOPAC'     ], # Boston U
     ['voyager.wrlc.org',           7090, 'voyager'     ], # Wash Res Lib Consor
     ['catalog.lib.jhu.edu',         210, 'horizon'     ], # Johns Hopkins
     ['z3950.lib.umich.edu',         210, 'miu01_pub'   ], # U Michigan
     ['catalog.library.cornell.edu',7090, 'voyager'     ], # Cornell
     # UK and Ireland
     ['library.ucc.ie',              210, 'INNOPAC'     ], # U College Cork
     ['library.ox.ac.uk',            210, 'MAIN*BIBMAST'], # Oxford
     ['z3950.nls.uk',               7290, 'voyager'     ], # Scottish Nat Lib
     ['lib-15.lse.ac.uk',           7090, 'voyager'     ], # LSE
     ['libsys.lib.hull.ac.uk',       210, 'INNOPAC'     ], # Hull
     # Europe (non-English)
     ['sigma.nkp.cz',               9909, 'NKC'         ], # Nat Lib Czech R
     ['lib.mpib-berlin.mpg.de',     2020, 'opac'        ], # Max Planck Inst
     ['ubsun02.biblio.etc.tu-bs.de',2020, 'bac'         ], # Bibliotheken Berlins
     ['z3950.kb.dk',                2100, 'KGL01'       ], # Kongelige Bibliothek
     ['www.bne.es',                 2210, 'BIMO'        ], # Nat Lib Spain
     ['roble.unizar.es',             210, 'INNOPAC'     ], # U Zaragoza
     ['www.helmet.fi',               210, 'INNOPAC'     ], # Helsinki Lib
     ['carmin.sudoc.abes.fr',        210, 'ABES-Z39-PUBLIC' ], # France
     ['gofor.bibli.ens-cachan.fr', 21210, 'ADVANCE'     ], # French school
     ['gofor.bibli.ens-cachan.fr', 21210, 'MAIN*BIBMAST'], # French school
     ['isis.cilea.it',              2100, 'usmarc'      ], # U Brescia
     ['aleph.library.tudelft.nl',   9909, 'tud01'       ], # # Techn U Delft
     ['z3950.bibsys.no',            2100, 'BIBSYS'      ], # Nat Lib Norway
     ['z3950.nb.no',                2100, 'norbok'      ], # Nat Lib Norway
     ['alpha.bn.org.pl',             210, 'INNOPAC'     ], # Nat Lib Poland
     ['z3950.btj.se',                210, 'BURK'        ], # Sweden
     # ['lbsihol.unimaas.nl',         7190, 'lbs'         ], # U Maastricht
     # Australia and New Zealand
     ['catalogue.nla.gov.au',       7090, 'voyager'     ], # Nat Lib Australia
     ['nlnzcat.natlib.govt.nz',     7190, 'voyager'     ], # Nat Lib New Zealand
     # Asia
     ['library.cuhk.edu.hk',         210, 'INNOPAC'     ], # Chinesse U HK
     ['linc.nus.edu.sg',             210, 'INNOPAC'     ], # Nat U Singapore
     ['nbinet.ncl.edu.tw',           210, 'INNOPAC'     ], # Nat Cent Lib Taiwan
     # ['wine.wul.waseda.ac.jp',       210, 'INNOPAC'     ], # Waseda U
     # Africa
     ['explore.up.ac.za',            210, 'INNOPAC'     ], # U Pretoria
     # ['natlib1.unisa.ac.za',         210, 'INNOPAC'     ], # Nat Lib South Africa

   ]

   total = 0
   #isbn_numbers = "978-0-545-05471-3,0-8037-2842-5,978-0-7642-0184-4,978-0-7586-1270-0,978-1-883551-45-2, 0-7847-1512-2, 978-1-5914-5447-2, 978-0-590-29972-5,0-439-81111-2,978-0-545-01162-4".gsub('-','').split(',')
   for isbn in isbn_numbers
    
     found = false
     # isbn = "978-1-883551-45-2"
     isbn = isbn.gsub(/[^0-9X]/, '')

     if (! /(978)*\d{9}[0-9X]/.match(isbn))
       puts "This is not a valid ISBN #{isbn}" # Not a true validity check!
     else

       # Two lists of open Z39.50 servers:
       #   http://targettest.indexdata.com/
       #   http://staff.library.mun.ca/staff/toolbox/z3950hosts.htm

   

       # Now the real business.  Loop through all the servers listed above
       # and query it about the ISBN until one answers or we run out of servers

       servers.each do |server|
         marcxml = z3950query(isbn, server[0], server[1], server[2])
         unless marcxml.nil?
           reader = MARC::XMLReader.new(StringIO.new(marcxml))
           new_record = MARC::Record.new()
           reader.each do |record|
             # Would be good to have an option or something so that people
             # wouldn't have to see the leader and other early fields and
             # possibly less interesting fields such as 9xx (local information).
             # Some libraries have lots of 852 (holdings) fields which
             # fill up the screen.
             #puts record.to_yaml
             puts "#{server[0]} ..."
             found = true
          
             if record['100'].nil?
               author_name = ' '*3
             else
               author_name = "#{record['100']['a']}   "[0..2]
             end
             puts author_name
      
             unless record['082'].nil?
      
               dewey_decimal = record['082']['a'].to_s
               #check if it already contains the 3 parts, add them if missing
               dewey_number_parts = dewey_decimal.split('/')
               if dewey_number_parts[1] == nil
                 dewey_number_parts[1] = 'EFic'
               end
               if dewey_number_parts[2] == nil
                 dewey_number_parts[2] = author_name
               end
    
               new_dewey_decimal = dewey_number_parts.join('/')
               puts new_dewey_decimal
               #          new_data_field = MARC::DataField.new('082','0','0',
               #            ['a', new_dewey_decimal],['2', record['082']['2'].to_s])

               #          record.append(new_data_field)
               #marc_raw_data = marc_raw_data.gsub(dewey_decimal, new_dewey_decimal)
             end
      
             #add fields to new_rcord
             record.each do |field|
               if field.tag == '082'
                 new_data_field = MARC::DataField.new('082','0','0',
                   ['a', new_dewey_decimal],['2', field['2'].to_s])

                 new_record.append(new_data_field)
               else
                 new_record.append(field)
       
               end
             end
             marc_file.write new_record
           end
           total = total + 1
           puts total
           break
         end
       end
       #puts "Sorry, nothing found for #{isbn}"
       err_file.puts isbn unless found
     end
   end
   marc_file.close
   err_file.close
 end
end

isbn_numbers = IO.read("in_isbn").split("\n")
ZMarc.import_records(isbn_numbers)
Feel free to use this script. It will only work with MRI ruby because of native extensions (ruby-zoom). Stay tune to this blog for a jruby script that gets the marc records from the Library of Congress's website.

Comments

Popular posts from this blog

Powershell script for converting JPG to TIFF

The following Powershell script will convert a batch of JPEG files to TIFF format: #This Code is released under MIT license [System.Reflection.Assembly]::LoadWithPartialName("System.Drawing") $files_folder = 'C:\path-where-your-jpg-files-are\' $pdfs = get-childitem $files_folder -recurse | where {$_.Extension -match "jpg"} foreach($pdf in $pdfs) { $picture = [System.Drawing.Bitmap]::FromFile( $pdf.FullName ) $tiff = $pdf.FullName.replace('.PDF','').replace('.pdf','').replace('.jpg','').replace('.JPG','') + '.tiff' $picture.Save($tiff) }

Power Automate: SFTP action "Test connection failed"

When I added an SFTP create file action to my Power Automate flow ( https://flow.microsoft.com ) , I got the following error in the action step, within the designer: "Test connection failed" To troubleshoot the Power Automate connection, I had to: go the Power Automate portal then "Data"->"Connections"  the sftp connection was there, I clicked on the ellipsis, and entered the connection info It turns out, that screen provides more details about the connection error. In my case, it was complaining that "SSH host key finger-print xxx format is not supported. It must be in 'MD5' format". I had provided the sha fingerprint that WinScp shows. Instead, I needed to use the MD5 version of the fingerprint. To get that, I had to run in command line (I was in a folder that had openssh in it): ssh -o FingerprintHash=md5 mysftpsite.com To get the fingerprint in MD5 format. I took the string (without the "MD5:" part of the string) and put

Alert if file missing using Powershell

The following Powershell script can be used to send an email alert when a file is missing from a folder or it is the same file from a previous check: $path_mask = "yourfile_*.txt" $previous_file_store = "lastfileread.txt" $script_name = "File Check" ###### Functions ########## Function EMailLog($subject, $message) {    $emailTo = "juanito@yourserver.com"    $emailFrom = "alert@yourserver.com"    $smtpserver="smtp.yourserver.com"       $smtp=new-object Net.Mail.SmtpClient($smtpServer)    $smtp.Send($emailFrom, $emailTo, $subject, $message) } Try {    #get files that match the mask    $curr_file = dir $path_mask |  select name    if ($curr_file.count -gt 0)    {        #file found        #check if the file is different from the previous file read        $previous_file = Get-Content $previous_file_store        $curr_file_name = $curr_file.Item(0).Name        if ($