VAMDC data policy

Data citation policy discussion

The goal for VAMDC is to enable easier access to data by data consumers and to ensure that data producers get proper credit, ie cited.

See also DataCitations.

Current purpose of this page: discuss policies and practical aspects for citation and acknowledgement of data sources by VAMDC users. Eventually, this should result in a policy statement.

At EPT-10 (16 Mar 2011), GTR presented results of discussions held with IOP publishing (IOPP) re. providing full data-citations extracted from XSAMS in papers citing VAMDC. IOPP are interested, but the discussions also brought up a large number of open questions and uncertainties related to data source citation. See notes by GTR.

Add comments here - roughly divided in practical and policy questions - the central question is number 12:

  1. Practical: Issue of permanence of the URLs pointing to repositories in the cited data links. Storage/archiving of data extracted by user from the VAMDC service: what information is stored by the Publisher if at all? The full XSAMS may be too large? External repositories might not be permanent so could lead to link rot and data becoming unavailable. Cambridge University has a sustainable free repository but this is not true for other places. Currently links to data sets in VAMDC are semi-permanent.
    Comments:
  2. Policy : Publisher cannot archive all of the raw data but how much data behind a figure do we need, as a publisher, to include with the archival article?
    Comments:
  3. Policy : Data citations in XSAMS have the potential to run into hundreds or thousands so it is impractical to include all of these in current article reference lists. Do we assume it is up to authors to decide on the relevance of particular data citations for their paper? They choose based on the criteria similar to those used for usual article citations? Do we assume it is their responsibility also to edit VAMDC list of returned results?
    Comments: In the end, the responsibility for references will be with the authors, the referees, and the editors. The VAMDC consortium (EPT, SAB, ...) should provide guidelines for them, keeping in mind our goal to give proper credits to data producers. -- UlrikeHeiter - 12 Sep 2011
  4. Policy : Possible ‘side effect’ could be that papers start being cited based on a single data point from a dataset that was presented in the paper? Could this have an impact on citation counts and impact factors?
    Comments: If the single data point was important for the results of the paper citing it, the citation should be included. Guidelines for the meaning of "important" are needed. See question 3. -- UlrikeHeiter - 12 Sep 2011
  5. Practical: What format should be used for data citations? How could citation to the raw data be included? What do we need to consider, both for IOPP and for STM?
    Comments:
  6. Practical: Is there a need for a “data citation” service (remember Hypercite) to count these citations and provide an equivalent to an impact factor for data citations?
    Comments: Not sure I understand the question. -- UlrikeHeiter - 12 Sep 2011
  7. Policy : How do authors remember to cite data? Should referees and editors be enforcing citations?
    Comments: Yes! But to what extent and how? - Guidelines are needed - see questions 3, 4 and 8. -- UlrikeHeiter - 12 Sep 2011
  8. Practical: Validation of data sets: what is the role of referees in checking/validating supporting data? VAMDC: referees are expected to run a basic visual check of the data sets but cannot be expected to fully check/validate them. If so, what information do we send to referees and store in the editorial system. Is it practical to send referees complete XSAMS file? Is there a subset of information we should give and ask for the referee to validate? Will this vary by subject area/journal? Do we require complete XSAMS at submission or could we ask for limited metadata and request more/all of the XSAMS upon acceptance?
    Comments:
  9. Practical: Data citation as a value-add feature for an article. Is there also an opportunity to use XSAMS data within the full text? Perhaps it would mean that tables in their current form are not required in future papers as readers could make their own tables from the XSAMS file – we would just provide the interface. We could provide a number of “views” onto the table then allow readers to manipulate. If XSAMS is integrated into the full text how is it done? Do we then have to store complete XSAMS to guarantee future availability or could we progressively enhance? i.e. have a “fallback” table equivalent to how tables are included in research papers “today” and then put the XSAMS on top of this, meaning if the external XSAMS was unavailable the article would fallback and display a flat table containing the information it would’ve had if the XSAMS wasn’t used.
    Comments:
  10. Policy : Partner journals: need an agreement with partners regarding their refereeing of data and usage of VAMDC service (especially those partners that don’t use Atom, for peer-review and/or production)
    Comments:
  11. Practical: Indexing and abstracting services: some of these will require the data to be archived with the article (ie Portico) but not all.
    Comments:
  12. Policy - central question: Integration in the submission process: possible in theory but need clear guidelines from VAMDC. IOP and VAMDC would need to work on introducing the best practice guidelines for authors submitting articles to IOPP and data to VAMDC.
    Comments: The guidelines can be drafted based on comments this list of questions, and circulated in EPT, SAB, IOPP, possibly other journals. -- UlrikeHeiter - 12 Sep 2011
  13. Practical: Issue of integrating the data in the final article and how we can use the metadata to help with discoverability of the data.
    Comments:
  14. Practical: Issue of duplicate citations: some of the citations in data sources might already exist in the reference list of an article.
    Comments:
  15. Practical: Issue of standardisation of the format: this is work in progress and will need more discussion with VAMDC.
    Comments:
  16. Practical: How would we display/ grant access to the data behind a figure/table etc? - see question 9.
    Comments:
  17. Practical: When will the XSAMS format become stable? If they update the spec, will these new versions be guaranteed to be backward compatible? Who will look after this XML data format, in the longer term?
    Comments:
  18. Practical: Question about data changes and version control remains: what if VAMDC make radical changes to data because they find data sets that are poor quality but included in our papers? For "safety and preservation " reasons, should we archive the data for any papers we receive based on this "experiment" --- so that we have access to it, irrespective of any changes?
    Comments:
  19. Policy : Does the use of this data cause any complications with copyright? Will we have to exclude XSAMS and any VAMDC-related content from our copyright?
    Comments: If the VAMDC standards and softwares will be Open Source, then this will not be an issue, I assume. See Na1T5 -- UlrikeHeiter - 12 Sep 2011
  20. Practical: Need to know more about the technical requirements in detail - e.g., let's see some sample XSAMS files and get a better feel for it. How big, how complex, what can you do with it? It’s a valuable opportunity to learn more about data-centric publishing but we really need to know a bit more about whether the atmol community is really interested and whether they are likely to use the VAMDC services or XSAMS data formats in their work--- and how they would use it.
    Comments:
  21. Practical: How much data would we have to send on to third parties and how do we do that (i.e. what format and structure; should it be sent as part of the standard reference lists?). For the scheme to work, people like Thomson Reuters (for Web of Science), Elsevier (for Scopus) and NASA (for ADS) would need to be willing to take the greatly enlarged reference lists that would result, and afford them equal status to the reference lists in the printed version (if different). If these organisations were unable or unwilling to handle the data, it would limit the utility of the initiative and therefore the uptake.
    Comments:
  22. Practical: Standard data formats with clearly defined stuctures, considerations of data longevity and availability of data – to be addressed.
    Comments:

Useful links

Topic revision: r3 - 24 Feb 2012 - 10:17:52 - UlrikeHeiter
 

No permission to view TWiki.WebTopBar

No permission to view TWiki.WebBottomBar