Hyacinth Name reconciliation subproject: Difference between revisions

From Columbia Wikibase Test
Jump to navigation Jump to search
(Created guidelines for the name reconciliation subproject)
 
(Changed setup of figures -- unable to add images to WbStack instance)
 
Line 1: Line 1:
Preface: In September 2020, Ryan M. ran a large reconciliation job on all temporary name terms from the Hyacinth name and subject_name vocabularies using a locally installed instance of the Conciliator reconciliation service.  Specifically, VIAF was queried for matches in the LCNAF.  Because there are 45447 personal names classed as type “temporary,” we will focus on the "temporary" personal names.  There are much more manageable numbers of temporary corporate names.  Some goals of the project: 1) cleanup our name controlled vocabularies, especially as we may move to Hyacinth v.3 soon; 2) longer term: explore how this project could be operationalized to ensure ongoing maintenance of controlled vocabularies in Hyacinth
=== Steps for working the name reconciliation subproject ===
=== Steps for working the name reconciliation subproject ===
# Import a project with a name set into OpenRefine
# Import a project with a name set into OpenRefine
## [https://wiki.library.columbia.edu/display/metadata/Name+sets%3A+WbStack+name+reconciliation+subproject Click here to download the name sets] (CUL login required).  These name sets consist of temporary terms from Hyacinth Name and Hyacinth Subject-Name controlled vocabularies; an LCNAF reconciliation job has been run on these items, returning either potential matches or no matches.  The objective is to 1) manually confirm correct matches; 2) consider running additional reconciliation jobs with other controlled vocabularies like Wikidata or ULAN.  The total number of terms comprised by these name sets exceeds 45,000.
## [https://wiki.library.columbia.edu/display/metadata/Name+sets%3A+WbStack+name+reconciliation+subproject Click here to download the name sets] (CUL login required).
## Checking potentially matched items in the name sets
## Checking potentially matched items in the name sets
### Set up
### Set up
Line 12: Line 14:
##### NOTE: all label values have been “backed up” in the column HyacinthLabel.  So if the value in the “value” column is not the preferred LCNAF label, but the match label value is correct, no need to do anything -- just click the double check to “match all.”  The LCNAF preferred label will be grabbed from the reconciliation result in a later phase of this project
##### NOTE: all label values have been “backed up” in the column HyacinthLabel.  So if the value in the “value” column is not the preferred LCNAF label, but the match label value is correct, no need to do anything -- just click the double check to “match all.”  The LCNAF preferred label will be grabbed from the reconciliation result in a later phase of this project
#### No correct match returned: Click “create new item” and carry on. After you get through your set, you can try running a reconciliation on these “new” terms in a different vocabulary, such as [https://wiki.library.columbia.edu/display/metadata/Reconciling+values+against+LCNAF+and+VIAF+using+Codefork ISNI] (or another source available via VIAF), [https://wiki.library.columbia.edu/display/metadata/Reconciling+against+Getty+Vocabularies%3A+AAT%2C+TGN%2C+ULAN ULAN] (CUL internal wiki), or [https://wiki.library.columbia.edu/display/metadata/Using+the+Wikidata+Reconciliation+Service+and+Data+Extension+API+in+OpenRefine Wikidata] (CUL internal wiki)
#### No correct match returned: Click “create new item” and carry on. After you get through your set, you can try running a reconciliation on these “new” terms in a different vocabulary, such as [https://wiki.library.columbia.edu/display/metadata/Reconciling+values+against+LCNAF+and+VIAF+using+Codefork ISNI] (or another source available via VIAF), [https://wiki.library.columbia.edu/display/metadata/Reconciling+against+Getty+Vocabularies%3A+AAT%2C+TGN%2C+ULAN ULAN] (CUL internal wiki), or [https://wiki.library.columbia.edu/display/metadata/Using+the+Wikidata+Reconciliation+Service+and+Data+Extension+API+in+OpenRefine Wikidata] (CUL internal wiki)
# Unmatched items: Feel free to review these to whatever extent you see fit.  You could try running alternate reconciliations on vocabularies like [https://wiki.library.columbia.edu/display/metadata/Reconciling+values+against+LCNAF+and+VIAF+using+Codefork ISNI] (or another source available via VIAF), [https://wiki.library.columbia.edu/display/metadata/Reconciling+against+Getty+Vocabularies%3A+AAT%2C+TGN%2C+ULAN ULAN] (CUL internal wiki), or [https://wiki.library.columbia.edu/display/metadata/Using+the+Wikidata+Reconciliation+Service+and+Data+Extension+API+in+OpenRefine Wikidata] (CUL internal wiki); you could batch this with the “new” items flagged above (see 2bii)
# Unmatched items: Feel free to review these to whatever extent you see fit.  You could try running alternate reconciliations on vocabularies like [https://wiki.library.columbia.edu/display/metadata/Reconciling+values+against+LCNAF+and+VIAF+using+Codefork ISNI] (or another source available via VIAF), [https://wiki.library.columbia.edu/display/metadata/Reconciling+against+Getty+Vocabularies%3A+AAT%2C+TGN%2C+ULAN ULAN] (CUL internal wiki), or [https://wiki.library.columbia.edu/display/metadata/Using+the+Wikidata+Reconciliation+Service+and+Data+Extension+API+in+OpenRefine Wikidata] (CUL internal wiki); you could batch this with the “new” items flagged above.
# If you note any oddness, please add a note into the column “Problem.” Delimit notes with a semicolon.
# If you note any oddness, please add a note into the column “Problem.” Delimit notes with a semicolon.


Line 18: Line 20:
* [https://wiki.library.columbia.edu/display/metadata/WbStack+name+reconciliation+subproject Fig. 2]:  How to set up the reconciliation facet for ease of navigation
* [https://wiki.library.columbia.edu/display/metadata/WbStack+name+reconciliation+subproject Fig. 2]:  How to set up the reconciliation facet for ease of navigation
* [https://wiki.library.columbia.edu/display/metadata/WbStack+name+reconciliation+subproject Fig. 3]: Use of the PrefLabel column.  Note that Adedeji and Albee's LCNAF matches both have periods (full-stops), for no discernible reason.  Thus, before clicking to confirm the match, I've copied over the correct version from the NAF record, which happens in these cases to correspond to the existing Hyacinth label.  For Afetinan and Aitken, note that the Hyacinth label is incorrect, but the match returned from LCNAF via VIAF is completely correct – no trailing punctuation or other discrepancies.  You can leave PrefLabel blank: the Hyacinth Label is retained in Hyacinth Label, and the preferred label can be extracted from the reconciliation data after the match is clicked.  If you are in a rush, just put an "X" in the PrefLabel column if the correct match has an incorrect label – usually the problems with Conciliator have to do with punctuation, special characters, and capitalization.  Don't go out of your way to double-check each item – just use the PrefLabel column if the suggested match appears correct but there are some problems with the preferred label supplied by the Conciliator reconciliation service.
* [https://wiki.library.columbia.edu/display/metadata/WbStack+name+reconciliation+subproject Fig. 3]: Use of the PrefLabel column.  Note that Adedeji and Albee's LCNAF matches both have periods (full-stops), for no discernible reason.  Thus, before clicking to confirm the match, I've copied over the correct version from the NAF record, which happens in these cases to correspond to the existing Hyacinth label.  For Afetinan and Aitken, note that the Hyacinth label is incorrect, but the match returned from LCNAF via VIAF is completely correct – no trailing punctuation or other discrepancies.  You can leave PrefLabel blank: the Hyacinth Label is retained in Hyacinth Label, and the preferred label can be extracted from the reconciliation data after the match is clicked.  If you are in a rush, just put an "X" in the PrefLabel column if the correct match has an incorrect label – usually the problems with Conciliator have to do with punctuation, special characters, and capitalization.  Don't go out of your way to double-check each item – just use the PrefLabel column if the suggested match appears correct but there are some problems with the preferred label supplied by the Conciliator reconciliation service.
</gallery>
[Metadata Working Group > WbStack name reconciliation subproject > Screen Shot 2020-09-11 at 2.09.31 PM.png]
[Metadata Working Group > WbStack name reconciliation subproject > Screen Shot 2020-09-11 at 4.04.13 PM.png]

Latest revision as of 19:34, 23 September 2020

Preface: In September 2020, Ryan M. ran a large reconciliation job on all temporary name terms from the Hyacinth name and subject_name vocabularies using a locally installed instance of the Conciliator reconciliation service. Specifically, VIAF was queried for matches in the LCNAF. Because there are 45447 personal names classed as type “temporary,” we will focus on the "temporary" personal names. There are much more manageable numbers of temporary corporate names. Some goals of the project: 1) cleanup our name controlled vocabularies, especially as we may move to Hyacinth v.3 soon; 2) longer term: explore how this project could be operationalized to ensure ongoing maintenance of controlled vocabularies in Hyacinth

Steps for working the name reconciliation subproject

  1. Import a project with a name set into OpenRefine
    1. Click here to download the name sets (CUL login required).
    2. Checking potentially matched items in the name sets
      1. Set up
        1. Facet by reconciliation candidate score [see fig. 1 and fig. 2]
        2. Uncheck “blank” -- Only “numeric” should be checked [fig. 2]
      2. Follow usual practices for reconciliation QA in OpenRefine, with the following modifications
        1. Preferred label: For reasons unclear to me, often the preferred label returned from VIAF for a match is not actually the current preferred label (MARC 100) of the term in the LCNAF. If you suspect that this is the case, please do one of the following:
          1. Put the preferred label into the PrefLabel column [preferred method]
          2. Put an “X” in the PrefLabel column -- this will serve as a flag that we need to do further recon/QA work. [time-saving method]. See Fig. 3
          3. NOTE: all label values have been “backed up” in the column HyacinthLabel. So if the value in the “value” column is not the preferred LCNAF label, but the match label value is correct, no need to do anything -- just click the double check to “match all.” The LCNAF preferred label will be grabbed from the reconciliation result in a later phase of this project
        2. No correct match returned: Click “create new item” and carry on. After you get through your set, you can try running a reconciliation on these “new” terms in a different vocabulary, such as ISNI (or another source available via VIAF), ULAN (CUL internal wiki), or Wikidata (CUL internal wiki)
  2. Unmatched items: Feel free to review these to whatever extent you see fit. You could try running alternate reconciliations on vocabularies like ISNI (or another source available via VIAF), ULAN (CUL internal wiki), or Wikidata (CUL internal wiki); you could batch this with the “new” items flagged above.
  3. If you note any oddness, please add a note into the column “Problem.” Delimit notes with a semicolon.
  • Fig. 1: How to add a facet based on reconciliation score
  • Fig. 2: How to set up the reconciliation facet for ease of navigation
  • Fig. 3: Use of the PrefLabel column. Note that Adedeji and Albee's LCNAF matches both have periods (full-stops), for no discernible reason. Thus, before clicking to confirm the match, I've copied over the correct version from the NAF record, which happens in these cases to correspond to the existing Hyacinth label. For Afetinan and Aitken, note that the Hyacinth label is incorrect, but the match returned from LCNAF via VIAF is completely correct – no trailing punctuation or other discrepancies. You can leave PrefLabel blank: the Hyacinth Label is retained in Hyacinth Label, and the preferred label can be extracted from the reconciliation data after the match is clicked. If you are in a rush, just put an "X" in the PrefLabel column if the correct match has an incorrect label – usually the problems with Conciliator have to do with punctuation, special characters, and capitalization. Don't go out of your way to double-check each item – just use the PrefLabel column if the suggested match appears correct but there are some problems with the preferred label supplied by the Conciliator reconciliation service.