Uber’s Fiber Is A New Distributed AI Model Training Framework

April 1, 2020

Uber’s Fiber Is A New Distributed AI Model Training Framework

Artificial Intelligence researchers at Uber have recently posted a paper to Arxiv outlining a new platform intended to assist in the creation of distributed AI models. The platform is called Fiber, and it can be used to drive both reinforcement learning tasks and population-based learning.

According to the team of researchers, Fiber has recently been made open-source on GitHub, and it’s compatible with Python 3.6 or above, with Kubernetes running on a Linux system and running in a cloud environment.

Uber explains that many of the most recent and relevant advances in artificial intelligence have been driven by larger models and more algorithms that are trained using distributed training techniques whereas Fiber makes the distributed system more reliable and flexible by combining cluster management software with dynamic scaling and letting users move their jobs from one machine to a large number of machines seamlessly.

Fiber also makes use of containers to ensure things like input data and dependent packages are self-contained. The Fiber framework even includes built-in error handling so that if a worker crashes it can be quickly revived.

Fiber paper explain that Fiber is able to do achieve multiple goals like dynamically scaling algorithms and using large volumes of computing power.

Fiber achieves many goals, including efficiently leveraging a large amount of heterogeneous computing hardware, dynamically scaling algorithms to improve resource usage efficiency, reducing the engineering burden required to make [reinforcement learning] and population-based algorithms work on computer clusters, and quickly adapting to different computing environments to improve research efficiency.