
Publication
M&A in the asset management and fund sector: Key themes for 2025
UK and European asset managers have been facing considerable headwinds over the past few years.
Canada | Publication | March 7, 2025
Developing high-performance generative AI systems and other AI systems based on machine learning often requires access to vast amounts of data for training (AI training data) and improving their accuracy and performance, and data scraping is an approach that is taken to generate large enough data sets. For example, there are crawler tools that compile a web archive dataset that includes both copyrighted work and open-source work. The resulting data sets can be very large (petabytes of data).
These datasets have been used by researchers, developers, and analysts for cross-domain progress in fields such as language processing, search engine optimization, and web analytics, among others. A challenge is that it’s difficult for these crawlers to distinguish between copyrighted / licensed materials, and open-source materials, especially as many broadly accessible data sources do not accurately identify data licensing terms, and it is very challenging to label accurately at scale. Accordingly, there are on-going concerns about the inclusion of copyrighted materials in these large data sets and their derivatives.
Scraping can directly affect creators and owners of IP-protected works, especially when conducted without consent or payment to rights holders, and the OECD’s white paper contrasts data scraping with other similar and related activities, such as “data mining,” and “web crawling.” Reference is made to the OECD AI Principles, as well as the OECD Recommendation on Enhancing Access and Sharing of Data, noting the balance between concerns regarding privacy, data protection, and intellectual property and attempting to maximize the benefits of data access and data sharing.
Data scraping can also be conducted by researchers using web scripts or other smaller scale automation approaches against various data sources, and while there can be terms of service / terms of use prohibiting this behaviour operating alongside robot directives such as robots.txt files or HTTP headers that implement the Robots Exclusion Protocol, actual compliance with these prohibitions or directives are often not technologically enforced. The protocol effectively trusts the crawler processes to respect the specific directives.
Some data sources implement technical protection measures (TPMs) to restrict the activities of automated crawler processes, such as by technologically enforcing digital rights management (DRM) policies, but these can be challenging and expensive to implement without significantly impacting the functioning of a tool or a website.
As an alternative to data scraping, there can also be curated collections of data available from dataset providers, including open-source data from academic preprint servers, paid repositories (e.g., of stock images), and licences can be obtained for copyrighted materials.
The OECD’s white paper on IP issues in AI trained on scraped data is based on discussions by the OECD Working Party on AI Governance at its November 2023 and June 2024 meetings. These considerations are at the forefront of several on-going disputes, where allegations have been raised both for IP infringement as well as breach of contract.
The white paper proposes a broad working definition of scraping as the automated extraction of information from third-party websites, databases, or social media platforms. The automated processes can include web scraping, web crawling, screen scraping, among others.
There are three general characteristics of scraping noted in the whitepaper:
The white paper discusses different approaches to address issues posed by data scraping. These include a code of conduct, technical tools, standard contract terms, and raising awareness.
For more information, please contact your IP professional at Norton Rose Fulbright Canada LLP.
For a complete list of our IP team, click here.
Publication
UK and European asset managers have been facing considerable headwinds over the past few years.
Publication
L’Union Européenne l’avait annoncé , le législateur français l’a fait : le 20 février 2025, l'Assemblée Nationale a adopté définitivement la proposition de loi restreignant la fabrication et la vente de produits contenant des PFAS2, que l’on surnomme les « polluants éternels ».
Subscribe and stay up to date with the latest legal news, information and events . . .
© Norton Rose Fulbright LLP 2025