Hyacinth Name reconciliation subproject

From Columbia Wikibase Test
Revision as of 19:34, 23 September 2020 by TrMendenhall (talk | contribs) (Changed setup of figures -- unable to add images to WbStack instance)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Preface: In September 2020, Ryan M. ran a large reconciliation job on all temporary name terms from the Hyacinth name and subject_name vocabularies using a locally installed instance of the Conciliator reconciliation service. Specifically, VIAF was queried for matches in the LCNAF. Because there are 45447 personal names classed as type “temporary,” we will focus on the "temporary" personal names. There are much more manageable numbers of temporary corporate names. Some goals of the project: 1) cleanup our name controlled vocabularies, especially as we may move to Hyacinth v.3 soon; 2) longer term: explore how this project could be operationalized to ensure ongoing maintenance of controlled vocabularies in Hyacinth

Steps for working the name reconciliation subproject

  1. Import a project with a name set into OpenRefine
    1. Click here to download the name sets (CUL login required).
    2. Checking potentially matched items in the name sets
      1. Set up
        1. Facet by reconciliation candidate score [see fig. 1 and fig. 2]
        2. Uncheck “blank” -- Only “numeric” should be checked [fig. 2]
      2. Follow usual practices for reconciliation QA in OpenRefine, with the following modifications
        1. Preferred label: For reasons unclear to me, often the preferred label returned from VIAF for a match is not actually the current preferred label (MARC 100) of the term in the LCNAF. If you suspect that this is the case, please do one of the following:
          1. Put the preferred label into the PrefLabel column [preferred method]
          2. Put an “X” in the PrefLabel column -- this will serve as a flag that we need to do further recon/QA work. [time-saving method]. See Fig. 3
          3. NOTE: all label values have been “backed up” in the column HyacinthLabel. So if the value in the “value” column is not the preferred LCNAF label, but the match label value is correct, no need to do anything -- just click the double check to “match all.” The LCNAF preferred label will be grabbed from the reconciliation result in a later phase of this project
        2. No correct match returned: Click “create new item” and carry on. After you get through your set, you can try running a reconciliation on these “new” terms in a different vocabulary, such as ISNI (or another source available via VIAF), ULAN (CUL internal wiki), or Wikidata (CUL internal wiki)
  2. Unmatched items: Feel free to review these to whatever extent you see fit. You could try running alternate reconciliations on vocabularies like ISNI (or another source available via VIAF), ULAN (CUL internal wiki), or Wikidata (CUL internal wiki); you could batch this with the “new” items flagged above.
  3. If you note any oddness, please add a note into the column “Problem.” Delimit notes with a semicolon.
  • Fig. 1: How to add a facet based on reconciliation score
  • Fig. 2: How to set up the reconciliation facet for ease of navigation
  • Fig. 3: Use of the PrefLabel column. Note that Adedeji and Albee's LCNAF matches both have periods (full-stops), for no discernible reason. Thus, before clicking to confirm the match, I've copied over the correct version from the NAF record, which happens in these cases to correspond to the existing Hyacinth label. For Afetinan and Aitken, note that the Hyacinth label is incorrect, but the match returned from LCNAF via VIAF is completely correct – no trailing punctuation or other discrepancies. You can leave PrefLabel blank: the Hyacinth Label is retained in Hyacinth Label, and the preferred label can be extracted from the reconciliation data after the match is clicked. If you are in a rush, just put an "X" in the PrefLabel column if the correct match has an incorrect label – usually the problems with Conciliator have to do with punctuation, special characters, and capitalization. Don't go out of your way to double-check each item – just use the PrefLabel column if the suggested match appears correct but there are some problems with the preferred label supplied by the Conciliator reconciliation service.