Roboflow expands open source datasets for better computer vision AI models

Roboflow expands open source datasets for better computer vision AI models

We’re excited to bring Transform 2022 back in person on July 19 and around July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!

All machine learning libraries and projects rely on data to learn, train, and run.

In an effort to help developers more easily take advantage of labeled data sets and computer vision machine learning models, Roboflow today announced the expansion of data sets and artificial intelligence models as part of the Roboflow Universe initiative, which could be one of the largest of these available source repositories. Roboflow claims to now have more than 90,000 datasets that include more than 66 million images in the Roboflow Universe service launched in August 2021.

Roboflow was founded in 2019 and raised $20 million in a Series A funding round in September 2021. Roboflow provides an open source repository of computer vision datasets and models as well as data labeling, model development, and hosting capabilities. Roboflow’s business model is to provide free levels of service to entry-level users, and then as usage grows, or for those organizations that work with private groups, the company provides paid support and service options.

Roboflow Universe is not just about providing images that a developer can use; It’s about providing images that are formatted in such a way that datasets can be used for AI-powered applications.

“A project is basically something that has a data set that anyone can use and a trained model on top of that data set,” said Joseph Nelson, co-founder and CEO of VentureBeat. The dataset is the images plus annotations.

The data is beautiful, the classified data is more beautiful

Organizations typically spend a significant amount of time preparing machine learning data, Nelson said.

The data preparation process includes labeling and classifying the data, so that the model can be trained effectively. Nelson said tagging in the Roboflow Universe isn’t just a description of an image either.

Labels that the Roboflow Universe can include for a given dataset are things like a bounding box, which provides a box around an object, which can be useful for object detection in a crowded landscape. Another type of markup that Roboflow does is instance segmentation, which provides a polygon shape that accurately plots around the object of interest.

The data classification formats used in machine learning are often complex and varied. To this end, Nelson said that Roboflow supports exporting datasets to 36 data-labeling caption formats. Among the supported formats are COCO JSON, VOC XML, and YOLO Darknet TXT format.

“Having the image data widely available and usable means anyone can instantly find a dataset, pull it into the training pipeline, get up and go on,” Nelson said.

How developers integrate Roboflow Universe datasets into applications

Bringing computer vision data sets and models into AI-powered applications is often a complex integration.

Nelson’s goal with Roboflow is to help reduce complexity. He said that the Roboflow Universe datasets can be accessed via open APIs. For example, he pointed out that Roboflow has a Python package hosted on the Python Package Index (PyPI) that enables developers to programmatically pull images, annotations, and forms and then embed those components directly into an application.

Deploying the Roboflow Universe model to popular cloud machine learning services, including AWS Sagemaker or Google Vertex, is a straightforward process via an API call, according to Nelson. In addition, Roboflow provides datasets and models as Docker containers, enabling deployment on high-end devices. There is also a software development kit (SDK) to support Apple iOS devices as well.

“If we make it very easy to use a model wherever you want to use it, then ideally an engineer would focus their time on the thing their business logic actually does,” Nelson said.

The intersection of open source models with artificial intelligence bias

Facilitating access to computer vision datasets and models for creating applications is a major goal of Roboflow. Another effect of having such a large amount of open source data is to help improve AI’s bias concerns.

“Bias in AI is never a solved problem,” Nelson said. “But providing explainability, accessibility, and discovery can help.”

Nelson explained that AI bias is often about trying to understand why a model makes a particular decision. Essentially, the way models make decisions depends on the data on which the models are trained. By having a larger data set that includes more diversity, the model can become more representative, with less risk of bias.

“Ultimately, a lot of the bias problems in AI stem from underrepresentation,” Nelson said. “The way to fix the underrepresentation is to enable active collection of datasets for the underrepresented class, and to make that data accessible, searchable, and usable.”

VentureBeat mission It is to be the digital city arena for technical decision makers to gain knowledge about transformational enterprise technology and transactions. Learn more about membership.

#Roboflow #expands #open #source #datasets #computer #vision #models