The national IPY data management in Norway is organised in a project (DOKIPY) where databases at Institute of Marine Research (IMR), Norwegian Polar Institute (NPI) and Norwegian Meteorological Institute (METNO) are synchronised at the metadata level using OAI-PMH serving DIF. We do also plan to use OAI-PMH to synchronise IPY metadata with GCMD. None of us, except for the NPI, has previous experience in using DIF. Thus we have had some interesting discussions, especially concerning how to map between various metadata standards. We have also had a dialogue with Melanie at GCMD on these issues. She has approved that we continue this discussion within this forum. In the present situation we have 2 main issues for discussion:
Incomplete DIF records
In at least one of the databases we have several metadata elements lacking proper DIF information as these were received prior to the IPY specification. We will of course edit these as soon as possible, but have decided to share this information at the national level (sometimes only minor information is lacking) although these records cannot be handled by GCMD before they are made complete. We will add the dummy text - Not Available to the elements not properly covered.
Do anyone else experience this problem and if so, how is it handled? This is however, not a big issue and will be solved by time anyway.
Changes in keywords
A bit more problematic in the current situation is the GCMD handling of the controlled vocabularies being used. In our discussion with GCMD we received the following information:
2) “How often do you see keywords change?”
GCMD: We have made additions to the keywords (note every time we add a new platform or instruments, that is a change to keywords). We still are actually quite dynamic (with additions). We expect to make significant Science Keyword changes to the list in 2009. Changes may include (1) renaming keywords, (2) deleted keywords, (3) adding new keywords, (4) changing where the keyword(s) lies in the hierarchy, (5) combining keywords.
What really worries us here is the problem of mapping GCMD keywords to other controlled vocabularies (e.g. Science Keywords to CF standard names) when GCMD plans to rename, delete or change location of keywords. In an operational context this creates a huge task maintaining the mapping between DIF and other standards. None of our databases use DIF internally, we all map to (and from) DIF.
We believe that will pose a problem for other IPY data management services as well. How do you plan to handle it? Could we make a coordinated effort towards GCMD concerning the procedures for changing keywords? XML-representations of the changes? This kind of changes to a controlled vocabulary may cause a lot of extra work.
We are however happy with the GCMD plans to assign identifiers to the controlled vocabularies. That will certainly make life easier in the future.
4) “To manage updates to keywords (both simple changes to science keywords, data center names, project names, and other) I think we would need each name to refer to an ID. The get_valid service (for example: http://gcmd.gsfc.nasa.gov/OpenAPI/get_valids.py?type=datacentervalid) could have an extra column that would give us the ID of the data center. At regular intervals we could then download the complete list and process it. We would use the ID to replace the names with already have in our database with the new ones. As the service is today we do not know if any of the names should replace names that we already have. I guess more complex changes (one data center splits into two, or similar) could be handled by email. ”
GCMD: We will be glad to inform you of changes by email. We plan to assign a specific identifier to each keyword in the future.
We would like some feedback on how other IPY data centres are handling these issues and not least how it should be handled within IPYDIS.
