Week 8 - Reflections on Open Source AI

This week, Nick Vidal from OSI was invited to give a guest lecture on Open Source AI in our class. Click to see my thoughts on this talk!

Reflections on Nick Vidal’s Talk

This week, Nick Vidal gave a presentation on the Open Source AI Definition 1.0 by Open Source Initiative. Though the backbone of Open Source Definition for AI is quite similar to that for traditional software, i.e, people have the rights to use, study, modify, and share any piece of Open Source AI, I believe the characteristics of AI induces extra challenge that people have not experienced before the era of deep learning.
Due to the curse of the Scaling Law (though it’s a vital discovery for the advancement of AI, I’m calling it curse because brings extra challenges for open source and privacy of AI), modern AI models are monstrous, data-hunger, and require insane computing power. Under such circumstances, the definition of Open Source AI should be taken with care: not only the code should be open-sourced, but also the training data, model parameters, and training setups. However, open-sourcing these properties results in the concerns of data privacy, AI hallucination and harmful-ness.
One great point of the Open Source AI Definition 1.0 is that they divided data information to 4 categories and proposed different requirements for each type. Still, I believe there is a long way to go to estabish a recognized standard for open-sourcing AI because:

the blackbox nature of deep models makes it hard to interpret AI, and often impairs the study and modification of open-source AI;
running AI often require significant computing power, and consequently many people cannot use and study AI even if they have the code, data, and parameters;
many tech giants keep their cutting-edge AI products close-source for commercial purposes, making the academia (as well as open-source community) lagging behind the industry considerably.

Connection to Our Final Project (Huggingface)

Huggingface is one of the most popular open-source AI platform around the globe. Corresponding to the requirements of Open Source AI Definition 1.0, Huggingface open sources code, datasets, and model parameters in different sections. This semester, our team will focus on the code part (as contributing to data or model weights often require significant computing power and numerous experiments), trying to fix existing bugs and implement new features at request.

Written before or on March 15, 2025