Diesen Beitrag in Deutsch lesen
Study – Paul Keller
Big data analytics is increasingly ubiquitous, and is used by many different players from large companies, individuals through to the research sector. The core of Big data analytics, which is also one of the fundamental facets of Artificial Intelligence (AI), is the ability for computers to analyse and extract information from structured and unstructured datasets. This process is often referred to as “Text and Data Mining” in a legal context, or more widely as “data analytics”.
Given the ubiquity of Text and Data Mining, particularly in the US (which shows significantly higher exploitation levels of data that in Europe), for reasons of international competitiveness the Commission decided in 2016 to introduce a new copyright exception allowing EU based research organisations to engage in Text and Data Mining for scientific purposes without having to obtain permission from rightsholders to do so. During the legislative discussion of the DSM Directive proposal an additional exception allowing Text and Data Mining by everyone under certain conditions was added to the text of the Directive [1]The additional exception was added in response to amendments proposed by both the Council and the European Parliament..
Article 2.2 of the DSM Directive defines Text and Data Mining (TDM) as “any automated
analytical technique aimed at analysing text and data in digital form in order to generate
information which includes but is not limited to patterns, trends and correlations.”
Article 3 (Text and Data Mining for the Purposes of Scientific Research) requires member states to introduce a mandatory exception to copyright [2]In this document the term term copyright(s) is used to mean both copyright and related (neighbouring) rights in copyright law, including sui generis database rights. in their national laws for the purposes of data analytics. This new exception gives researchers who have legal access to the open web as well as the collections of universities, libraries, archives and other cultural heritage organisations across the EU, the freedom to engage in Text and Data Mining for scientific purposes without requiring permission from rightsholders. Member states are encouraged to meet with research organisations, cultural heritage organisations and rightsholders to discuss appropriate security measures relating to the exercise of the exception. The right to enjoy this new exception cannot be removed by either contract or by technical protection measures.
Article 4 (General exception for Text and Data Mining) requires member states to introduce a mandatory exception to copyright in their national laws for the purposes of Text and Data Mining by anyone who wishes to mine materials subject to copyright for any purpose. Rightsholders however can prevent data mining under this exception if they so choose.
The main difference between the two exceptions is that the research exception (Article 3) allows researchers affiliated to public interest organisations to keep a copy of the information they mined, and this cannot be prevented by contract or technical protection measures. The second exception (Article 4), which can be enjoyed by anyone, only allows Text and Data Mining to be performed on content for which rights holders have not not expressly reserved their this right.
These two new mandatory exceptions have the potential to support big data analytics and artificial intelligence (AI) in Europe. The exceptions are relatively straightforward. However a poor of a number of important details into national law could severely hamper the ability of beneficiaries of the exceptions to undertake Text and Data Mining. This concerns the following issues:
Rapid TPM removal
Technical Protection Measures [3]Technical protection measures (TPMs) refer to locks, marks or other tools (e.g. password control systems, payment systems, time access controls, encryption measures, captcha technology, etc.) … Continue reading (short TPMs not to be confused with TDM) preventing Text and Data Mining remain an area of very real outstanding concern. This can range from basic technical features such as captcha technology that can frustrate mining, through to more sophisticated technical protection measures. For example scholarly publishers use systems that not infrequently end up blocking access to databases that universities have paid a subscription for. Often what publishers are doing is monitoring download rates and if their systems are alerted to atypical download / request / load rates they may assume that part of the university technical infrastructure has been compromised and will cut off access. Both TDM exceptions are covered by Article 7, which specifies that Member States must not give legal protection to Technological Protection Measures (TPMs) that would prevent beneficiaries from exercising their rights under these exceptions and requires rightholders to remove TPMs where they conflict with the exercise of the rights granted under these exceptions. Unfortunately the Directive provides little clarity about the process to be followed in order to remove these.
The Directive underlines that rightholders should, first of all, be given the possibility to remove technological protection measures that prevent enjoyment of the exception. This could be problematic if it leads to long delays for researchers engaging in TDM, with a need to apply for voluntary changes each time a TPM is encountered. It will be important to ensure that such a process is quick and simple in order to ensure that beneficiaries of the TDM exceptions do not face unreasonable delays. To achieve this goal, Member States should stipulate that TPMs have to be removed by the rightholders within 72 hours after a request (See also the separate section on Article 7 below).
Data storage
The Directive refers to the secure storage of content used for Text and Data Mining. Despite efforts by rightholders during the discussions in the European parliament to oblige the deletion of datasets created in the course of Text and Data Mining (which runs entirely contrary to all scientific practice) Article 3(2) only refers to storing datasets “with an appropriate level of security”. It does not prescribe in any detail what this should mean. Recital 15 however suggests Member States discuss it, including using trusted intermediaries for storage. From the perspective of research institutions and researchers it would be inappropriate for any detailed or technological expression of security measures to be imposed on researchers. Member States should refrain from imposing any specific or more onerous storage obligations than common sense dictates on these institutions.
Robots.txt
Article 4(3) of the Directive specifies that rightholders who want to exclude their works from the scope of the general TDM exception can do so by expressly reserving their rights “in an appropriate manner, such as machine-readable means in the case of content made publicly available online”. The requirement that such reservations are made in a machine readable way is welcome, but insufficient. For this provision to be effective (both for rightholders wishing to reserve their rights and for beneficiaries wishing to respect such reservations) such reservations need to be made in a standardized way. While the text of the Directive leaves the further implementation of this provision to the individual member states, a standardized way of expressing the rights reservation must be established on the European level to prevent the risk that Member States adopt different standards. This means that the European Commission must actively work with all relevant stakeholders and the Member States to determine a standard for machine readable rights reservations. The most obvious candidate for this is the Robots Exclusion Standard which is adhered to by the largest Text and Data Mining operations on the internet, including Google, Bing, Baidu, DuckDuckGo, Yahoo!, and Yandex. Due to the search engines commitment to follow these rules, nearly all websites on the planet are following the standard to control what can be mined by their bots and can easily be used to express the type of reservation foreseen in Article 4(3).
Implementation outlook
Big data analytics is an activity with a substantial economic impact and during the legislative procedure the (scientific) publishing sector has aggressively sought to protect its economic interests in this area. While the issues discussed above are mainly of a technical nature, they will have a big impact on the usefulness of the TDM provisions introduced by the Directive. It is therefore likely that publishers and other rightholders will seek to influence the national implementation in these areas and as a result the academic research community will need to closely monitor the national implementations and seek representation in the stakeholder consultations foreseen by Article 3(4).
Fußnoten
↑1 | The additional exception was added in response to amendments proposed by both the Council and the European Parliament. |
↑2 | In this document the term term copyright(s) is used to mean both copyright and related (neighbouring) rights in copyright law, including sui generis database rights. |
↑3 | Technical protection measures (TPMs) refer to locks, marks or other tools (e.g. password control systems, payment systems, time access controls, encryption measures, captcha technology, etc.) that control access to and/or what a user can do with a digital work, such as a book, video or any another file type. |