Hugging Face
Hugging Face
Hugging Face, the leading platform for open source machine learning projects, has made a strategic acquisition of XetHub, a Seattle-based startup specializing in file management for artificial intelligence projects. The acquisition significantly enhances Hugging Face's AI storage capabilities, enabling developers to work with larger models and datasets more efficiently.
XetHub was founded by Yucheng Low, Ajit Banerjee and Rajat Arya, who previously worked at Apple building and scaling its internal ML infrastructure. The founders have extensive experience in machine learning and data management. Yucheng Low was also the co-founder of Turi, an innovative ML/AI company that was acquired by Apple in 2016.
The startup has raised $7.5 million in seed funding led by Seattle-based venture capital firm Madrona Ventures.
To understand the impact of this acquisition, it's important to understand Git Large File Storage (LFS). Git LFS is an open source extension that enables version control systems to handle large files more efficiently. Hugging Face currently uses Git LFS as its storage backend, but the system has limitations. For example, when developers update AI models or datasets on Hugging Face's platform, they must re-upload the entire file, which can be time-consuming for large files containing gigabytes of data.
XetHub's platform introduces a breakthrough solution by breaking down AI models and datasets into smaller, more manageable pieces. This approach allows developers to update only the specific segments that have changed, instead of re-uploading the entire file. This results in significantly faster upload times, which is essential for maintaining the agility of your AI development workflow.
Additionally, XetHub's platform offers additional features to streamline the AI development process, including:
Advanced version control: Allows you to accurately track changes across iterations of AI models and datasets. Collaboration tools: Facilitates seamless teamwork on complex AI projects. Neural network visualization: Provides an intuitive representation of AI model architecture for easier analysis and optimization.
By integrating XetHub's technology, Hugging Face is poised to overcome its current storage limitations. With this upgrade, the platform will be able to support individual files over 1 TB and repository sizes over 100 TB, hosting significantly larger models and datasets. This capability is essential to Hugging Face's goal of maintaining the most comprehensive collection of foundational model and dataset resources in the world.
Hugging Face's acquisition of XetHub brings a range of significant benefits to the platform's users. Developers can expect increased productivity due to significantly faster upload times for large AI models and datasets, enabling faster iteration and deployment cycles. Collaboration between distributed AI development teams will become more efficient, improving teamwork and knowledge sharing. The integration also enables robust version control capabilities, improving traceability and reproducibility of machine learning workflows, which are essential for maintaining quality and consistency in AI projects. Perhaps most importantly, the acquisition will increase scalability, enabling support for larger, more complex AI projects that push the limits of current technology, opening up new possibilities for innovation and advancements in the field of artificial intelligence.
As AI continues to evolve, the ability to efficiently handle large models and datasets is especially important. Recent advances in areas such as large-scale language models (e.g., Meta Llama, Google Gemma) and computer vision have highlighted the importance of working with large datasets and increasingly complex model architectures. Hugging Face's improved infrastructure will enable developers to keep up with these rapid advancements, potentially sparking new breakthroughs in AI research and applications.
With the XetHub integration, the workflow of using the Hugging Face model and dataset becomes more similar to that of Docker, which uses a layered file system, and instead of uploading and downloading entire container images, developers can pull or push only the parts of files that have changed.
This strategic acquisition by Hugging Face accelerates the democratization of AI technology. By removing the technical barriers associated with managing large-scale AI projects, Hugging Face makes advanced AI development more accessible to a global community of researchers, developers and companies.
Hugging Face's acquisition of XetHub is an important step in accelerating the adoption of the openweight model. By addressing significant limitations in data storage and management, the move solidifies Hugging Face's leadership position in the AI development ecosystem.