Could Lack of Access to Training Data Block Lightning-Fast AI Development?

Lon Harris
Lon Harris is a contributor to dot.LA. His work has also appeared on ScreenJunkies, RottenTomatoes and Inside Streaming.
Could Lack of Access to Training Data Block Lightning-Fast AI Development?
Evan Xie

One perhaps under-scrutinized aspect of the AI revolution concerns how these various applications are trained. “ChatGPT learned about language by reading the internet” is the casual, tossed-off, shorthand version of the explanation. But the specific content that’s fed into these apps goes a long way in determining what kinds of outputs they ultimately generate. So if an AI application is only as good as the content that’s fed into it, would it then be fair to say that the creators and owners of that content deserve compensation?


This isn’t a purely academic or rhetorical question, but a pressing real-world issue that will soon require an answer. On Tuesday, news aggregation website Reddit announced plans to begin charging companies for access to its API, an early indication that it hopes to earn money in exchange for providing training materials to companies like OpenAI.

Reddit’s data is particularly appealing to OpenAI and other designers of so-called Large Language Models (LLMs). Unlike Google search results, Wikipedia pages, or other vast collections of writing and information, Reddit threads are already made up of real human beings engaged in conversations. They’re naturally going to be helpful in designing a chatbot that mimics real human speech and strives to create authentic interactions.

Additionally, Reddit content is constantly refreshed by its own users. They add headlines the moment news stories are posted and conversation threads get consistently refreshed with new commentary and real-time updates. This also helps LLMs and other AI systems to produce better, more accurate results. Both Google’s Bard and OpenAI’s ChatGPT were partly trained on Reddit data. In fact, ChatGPT cites Reddit as one of the primary sources for its training information.

It’s fair and accurate to say that Stable Diffusion and Midjourney produce original artwork, sure, and ChatGPT and Google’s Bard compose their own prose from scratch. But they’re only able to complete these tasks after scanning thousands of original drawings and millions of original sentences written by humans. In some ways, this is less “Artificial Intelligence” and more “Extensively Mashed-Up and Remixed Human Intelligence.” But that makes for a less appealing acronym.

From the perspective of an independent site like Reddit, or an image hosting service like Shutterstock, AI applications represent not just a way to squeeze additional financial value out of their pre-existing content libraries. Over time, these applications could emerge as potential rivals. ChatGPT could certainly one day power its own version of Reddit, scraping the web for fascinating news stories, writing attention-grabbing headlines, and posting them in forums to encourage reactions and discourse. OpenAI’s DALL-E is already used as an illustrator for web content. Obviously, a sufficiently advanced image generation tool could replace a stock photo library. So by charging up-front for their data, Reddit is also preparing early for a future in which its human users square off against automated rivals.

That said, Reddit’s not alone in their concern about being scraped, and efforts to potentially do something about it. Elon Musk has threatened to sue Microsoft over the use of his Twitter data. In January, Getty Images filed a lawsuit against Stable Diffusion creators Stability AI, alleging that the company “unlawfully copied and processed millions of images protected by copyright” in order to train their AI systems. Researchers from the University of Chicago have introduced Glaze, a beta application that adds imperceptible “perturbations” to artwork, thus preventing AI applications from scraping them and learning to copy that artist’s style and aesthetic.

According to emails obtained by the Financial Times, Universal Music Group – one of the industry’s largest labels – has asked streaming services like Spotify and Apple Music to limit AI access to their content, in an effort to prevent apps from scraping their songs and artists. This of course is no longer purely academic either. That “Heart On My Sleeve” AI-generated song that appears to feature both Drake and The Weeknd was only possible to create because the app had trained on Drake and Weeknd songs. As well, a number of tools and tutorials to stop ChatGPT from scraping your content have been released on the web.

From a legal perspective, issues around AI remain largely unresolved, including who can copyright the results of a human working with a generative AI application. Training adds yet another level of complexity to this question. Even if we one day establish that a person can copyright a piece of art that they created with help from an AI application… what if that AI application trained on work produced by a different human? Does the original artist whose creation was used to develop the software also own a piece of the final result? How are they compensated, if at all?

A Forbes editorial from February, written by a legal expert in AI, warns that generative AI is “rife with potential ethnical and legal conundrums,” particularly when it comes to plagiarism and copyright infringement. Perhaps, over time, these issues will simply be worked out by judges and juries to everyone’s mutual benefit. But it’s also possible this could present a genuine roadblock for either artists and creators or the future of AI development. If AI companies are allowed to scrape whatever they please without compensation, and build the next generation of internet applications without input from humans, that leaves a lot of individual artists, writers, and creators out in the cold.

Conversely, if AI companies are not allowed to use anyone’s work to train their system without payment, we could be looking at the end of the lightning-fast development we’ve come to expect from the entire field. As we’ve seen repeatedly, AI applications are only as good as the library of content on which they’re trained. This is why China, with its internet pockmarked by banned content and censorship, has yet to create a true ChatGPT rival.

Standing Together Through the Flames

🔦 Spotlight

To our Los Angeles family,

This week’s wildfires have brought immense pain and hardship to our beloved city. Many of our friends, neighbors, and colleagues have faced evacuations, power outages, and the devastating loss of homes and livelihoods. Our hearts go out to everyone affected by this tragedy.

At dot.LA, we want to express our deepest sympathy to those suffering in this moment. We see your resilience and stand with you during this challenging time. This community has always been defined by its strength and compassion, and now is the time to come together in support.

If You or Someone You Know Has Been Impacted, Resources Are Available:

Evacuation Shelters:

  • Calvary Community Church: 5495 Via Rocas, Westlake Village, CA 91362
  • Ritchie Valens Recreation Center: 10736 Laurel Canyon Blvd., Pacoima, CA 91331
  • Pan Pacific Recreational Center: 7600 Beverly Blvd., Los Angeles, CA 90036
  • Westwood Recreation Center: 1350 Sepulveda Blvd., Los Angeles, CA 90025
  • Pasadena Civic Auditorium: 300 East Green Street, Pasadena, CA 91101
  • Pomona Fairplex: 1101 W McKinley Ave, Pomona, CA 91768
  • Stoner Recreation Center: 1835 Stoner Ave, Los Angeles, CA 90025

Animal Shelters:

Small Animals:

  • Agoura Animal Care Center: 29525 Agoura Rd, Agoura Hills, CA 91301
  • Baldwin Park Animal Care Center: 4275 Elton St, Baldwin Park, CA 91706
  • Carson Animal Care Center: 216 W Victoria St, Gardena, CA 90248
  • Downey Animal Care Center: 11258 Garfield Ave, Downey, CA 90242
  • Lancaster Animal Care Center: 5210 W Ave I, Lancaster, CA 93536
  • Palmdale Animal Care Center: 38550 Sierra Hwy, Palmdale, CA 93550

Large Animals:

  • Pomona Fairplex: 1101 W McKinley Ave, Pomona
  • Industry Hills Expo: 16200 Temple Ave, City of Industry, CA 91744
  • Antelope Valley Fair: 2551 W Avenue H, Lancaster, CA 93536
  • Los Angeles Equestrian Center: 480 W Riverside Dr, Burbank, CA 91506
  • Pierce College Equestrian Center: 7100 El Rancho Dr, Woodland Hills, CA 91371

Disaster Relief Information:

  • LA County Assessor: Information for property owners and FAQs about disaster relief.

Mental Health Support:

  • Los Angeles County Department of Mental Health: Crisis counseling and support for those affected. Access services through their website or call their hotline at (800) 854-7771.

Temporary Housing Support:

  • Airbnb: In partnership with 211 LA, offering free temporary housing for displaced residents. Spaces are limited; complete the form to be notified of availability.

Transportation Support:

  • Uber: Use promo code WILDFIRE25 for 2 free rides up to $40 each to/from active shelters.
  • Lyft: Code CAFIRERELIEF25 offers 2 rides up to $25 each for up to 500 riders, valid until 1/15.
  • Metro: Fare collection is suspended systemwide.

Staying Informed:

  • Watch Duty App: Provides real-time wildfire tracking, evacuation warnings, and updates.
  • Los Angeles Fire Department Alerts: Visit their website for the latest information on fire status and safety guidelines.

Safety Precautions:

  • Ready, Set, Go!: Personal Wildfire Action Plan by the Los Angeles County Fire Department.

To those in our community who are volunteering, donating, or offering aid in any form—thank you. Your efforts embody the spirit of LA: strong, compassionate, and unstoppable.

At dot.LA, we’re committed to amplifying stories of resilience and support. If you’ve seen inspiring acts of kindness or have resources to share, please let us know. Together, we can shine a light on the incredible ways this community is stepping up during these trying times.

In the days ahead, let’s hold tight to the bonds that unite us and remember that we are stronger together. The fires may scar the land, but they cannot dim the collective spirit of Los Angeles.

We’re here for you, and we’re with you.

    Download the dot.LA App

    A Strong Finish to 2024 for LA Tech: Crosscut Ventures Leads the Way

    🔦 Spotlight

    Happy Friday LA!

    As we close the book on 2024, Los Angeles has had a remarkable year in tech and venture capital. From groundbreaking funding rounds to industry-defining innovations, the city’s tech ecosystem has showcased its ability to adapt and thrive. Among the year’s final highlights was the announcement that Crosscut Ventures, one of LA’s premier early-stage venture capital firms, has added Jon Ylvisaker as its newest Partner.

    Crosscut Ventures’ Bold New Direction

    Announced in late December, Jon Ylvisaker’s appointment reflects Crosscut Ventures’ commitment to advancing its focus on the energy transition. Ylvisaker brings decades of experience in driving investments in energy technologies and digital infrastructure. As the founding partner and managing director of Yield Capital Partners, he led investments in startups and established companies shaping the future of sustainability. At Wolfacre Global Management, a Tiger Management hedge fund, he further honed his expertise in supporting impactful climate-focused solutions.

    Brian Garrett, Managing Director and Co-Founder of Crosscut Ventures, said, “Jon's extensive experience in climate and digital infrastructure investments, coupled with his impressive track record of bringing groundbreaking technologies to market, makes him the ideal partner to help lead our focus.”

    Since its founding in 2008, Crosscut has played a key role in shaping LA’s tech landscape. Ylvisaker’s addition reinforces the firm’s commitment to addressing global challenges like energy transition and sustainability, further solidifying its leadership in venture capital innovation.

    What’s Next for LA Tech in 2025

    The momentum from 2024 has set the stage for an even bigger year ahead. Entrepreneurs, investors, and innovators in LA are poised to take on new challenges and create meaningful change across industries.

    As we step into 2025, we want to thank everyone who helped make 2024 such a standout year. Here’s to another year of progress, innovation, and success. From all of us at dot.LA, Happy New Year!

    🤝 Venture Deals

    LA Companies

    • First Resonance, a company specializing in digital manufacturing software through its ION Factory OS, has raised a $20M funding round led by Third Prime with participation from Blue Bear Capital and others. This brings its total funding to $36M and will be used to accelerate product development, grow its customer base, and enhance support for advanced manufacturing sectors like aerospace, robotics, and clean energy. - learn more
    LA Venture Funds
    • Finality Capital Partners led a $17M Seed funding round for ChainOpera AI, a California-based company developing blockchain networks for AI-powered agents and applications, to accelerate product development, expand its team and enhance its blockchain and AI integration capabilities. - learn more

    LA Exits

    • Thirteen Lune, an inclusive beauty e-commerce platform, has been acquired by SNR Capital, marking a significant milestone in the platform's mission to amplify underrepresented beauty brands while fueling its next stage of growth. - learn more
    • Ergobaby, a leading brand in juvenile products known for its high-quality baby carriers, has been acquired by Highlander Partners. The acquisition aims to bolster Ergobaby’s growth, expand its product offerings, and strengthen its position in the parenting solutions market. - learn more

    Download the dot.LA App

    Salt AI’s $3M Bet, Snapchat’s Creator Cash, Rivian’s EV Tech, and ŌURA’s $200M Win

    🔦 Spotlight

    Happy Friday, LA - let’s dive right in to this week’s highlights:

    Salt AI, a forward-thinking AI startup based in Los Angeles, has secured a $3 million seed funding round led by Morpheus Ventures with participation from Struck Capital, among others, to tackle the complexity of managing workflows.Salt AI's blog details how its platform centralizes tools like CRM systems, project management software, and data trackers into one interface, eliminating inefficiencies and freeing up teams to focus on meaningful work. With new funding in hand, Salt plans to scale its platform and expand its reach, a move that underscores how AI can solve everyday business challenges.

    Image Source: Salt AI - Aber Whitcomb

    While Salt AI focuses on the workplace, Snapchat is doubling down on creators, with its latest updates introducing revenue-sharing opportunities and direct monetization features. The company’snewsroom update outlines how enhanced analytics will help creators better understand their audiences and sustain their work. The platform's latest updates introduce revenue-sharing opportunities and direct monetization features, along with analytics that give creators deeper insights into their audience. By making it easier for creators to grow and sustain their work, Snapchat positions itself as a key player in the creator economy, offering features that rival platforms like YouTube and TikTok.

    Image Source: Snap

    On the roads, Rivian is redefining what it means to drive an electric vehicle. The company’s latest software update includes advanced route planning, energy management tools, and customization options that make every trip more intuitive and efficient. Additionally, Rivian has introduced new entertainment features, including Google Cast, YouTube, and SiriusXM, as featured in Rivian’ssoftware spotlight, enhancing the in-cabin experience for drivers and passengers alike. This isn’t just about convenience; Rivian is showing how thoughtful software design can elevate the entire EV experience, blending practicality with sophistication.

    Image Source: Rivian

    ŌURA is making headlines with a fresh $200 million Series D funding round, with participation from Fidelity Management & Research Company and Dexcom, which now values the company at $2.55 billion. This investment, as reported byBusiness Wire, highlights the growing demand for wearable health technology and positions ŌURA as a leader in the space. With its sleek design and emphasis on actionable health insights, the funding will enable ŌURA to expand its reach and further integrate wearables into daily health management, strengthening its position in the competitive health tech market. With this funding, ŌURA aims to reach more users and expand its capabilities, further embedding wearables into daily health management.

    Image Source: ŌURA

    Stay tuned as Salt AI, Snapchat, Rivian, and ŌURA continue to evolve, offering us new ways to work, connect, and live better.

    🤝 Venture Deals

      LA Venture Funds
        • Undeterred Capital participated in a $7M Seed funding round for Portal, a Watertown, Mass.-based biotech company specializing in advanced intracellular delivery technology to drive innovations in biological research and cellular therapeutics. - learn more
        • Vamos Ventures participated in a $7.9M Series A funding round for Culina Health, a Hoboken, NJ-based company that provides personalized, science-based virtual nutrition care by connecting patients with registered dietitians, with plans to use the funds to expand its offerings for dietitians and patients, implement AI-driven tools to enhance care efficiency, and strengthen its leadership team through key hires. - learn more
        • Humans Ventures participated in a $3.8M Seed funding round for Hamming.ai, a San Francisco-based company specializing in automated tools for testing and optimizing voice agents, with plans to expand its platform, enhance reliability and perform, and accelerate product development. - learn more
        • Fifth Wall led, with participation from Starshot Capital and others, in a $9.5M Series A funding round for Mojave, a Sunnyvale, CA-based company developing energy-efficient commercial air conditioning technology. The funds will be used to accelerate the adoption of its innovative systems and reduce energy consumption in the cooling industry. - learn more
        • ReMY Investors participated in a $17M Series B funding round for Scripta Insights, a company that leverages data analytics to help employers and healthy plans reduce prescription drug costs, with the funds aimed at expanding its platform and scaling operations. - learn more
        • Mantis VC participated in a $16.5M funding round for Nuon, a company specializing in Bring Your Own Cloud (BYOC) solutions that streamline AI, data, and infrastructure software deployment. The funds will support product development, readiness for general availability in 2025, and efforts to expand customer acquisition. - learn more
        • B Capital participated in a $102M Series C funding round for Precision, a company developing minimally invasive brain-computer interfaces to treat neurological disorders, with plans to use the funds to expand its team, advance clinical research, and refine its AI-powered brain implant for helping users with severe paralysis operate digital devices using their thoughts. - learn more
        • The Games Fund led a $3M Seed funding round for Dark Passenger, a Poland-based game studio founded by veterans of The Witcher 3 and Cyberpunk 2077, to create an unannounced, innovative, first-person multiplayer PvPvE stealth-action game set in a distinctive universe inspired by feudal Japan and martial arts cinema. - learn more

            LA Exits

            • Calliope Networks, a generative AI company providing licensed media content like movies, TV shows, and news, has been acquired by Protege to strengthen its platform’s capabilities in advancing AI development. - learn more

                Download the dot.LA App

                RELATEDEDITOR'S PICKS
                Trending