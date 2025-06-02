MENAFN - GlobeNewsWire - Nasdaq) The datasets fast-track video data from creator consent to AI-readiness

VILNIUS, Lithuania, June 2, 2025 – Oxylab , a leading web intelligence platform and proxy provider, introduces industry-first YouTube datasets composed entirely of consent-based data. All of the millions of original videos in the datasets have the explicit consent of the creators to be used for AI training, allowing to bridge the gap between creators and innovators.

“In the ecosystem aiming to find a fair balance between respecting copyright and facilitating innovation, YouTube streamlining consent giving for AI training and providing creators with flexibility is an important step forward. Many channel owners have already opted in for their videos to be used in developing the next generation of AI tools. This enables us to create and provide high-quality, structured video datasets. Meanwhile, AI developers have no trouble verifying the data's legitimate origin,” said Julius Černiauskas, CEO at Oxylabs.

All datasets offered by Oxylabs include videos, transcripts, and rich metadata. While such data has many potential use cases, Oxylabs refined and prepared it specifically for AI training, which is the use that the content creators have knowingly agreed to.

Large volumes of high-quality video data are fundamental for developing multimodal AI, capable of seamlessly handling text, audio, and visual data when performing tasks or generating different types of content. Acquiring such data in a convenient way that establishes a transparent link between creators and AI companies is a challenge the industry is still trying to solve. Structured, AI-ready datasets from YouTube are now a part of this developing improved model for training AI on public data.

Importantly, consent-based datasets also allow AI companies and creators to be on the same page regarding fair AI development. This development has been riddled with still unanswered questions about making copyrighted material fuel rather than stall innovation.

“These datasets offer a breath of fresh air to a tense ecosystem in dire need of facilitating systematic cooperation between creators and AI companies based on mutual agreement. The next wave of tools that will shake the market can now be built on data that all can agree is right for AI training. Hopefully, this also marks a better, more sustainable way forward,” concluded Černiauskas.

The release of ethically sourced YouTube datasets continues Oxylabs' longtime mission to establish and promote ethical industry practices, previously marked by co-founding the Ethical Web Data Collection Initiative (EWDCI ) and introducin an industry-first transparent tier framework for proxy sourcing.

