Federated learning is proposed as an alternative to centralized machine learning since its client-server structure provides better privacy protection and scalability in real-world applications. It is experiencing a fast boom with the wave of distributed machine learning and ever-increasing privacy concerns. With the increased computing and communicating capabilities of edge and IoT devices, applying federated learning on heterogeneous devices to train machine learning models is becoming a trend. The federated analytics approach enables extracting insights from data residing on different systems without requiring the data to be brought to the central location. By leveraging these different data sources, federated analytics can provide powerful insights in relation to different areas such as understanding end-users behavior. In this blog post, I will explain what federated analytics is and how it works so you can understand its potential applications for your business!
What is federated analytics?
Federated Analytics is about extracting insights from data without any of the data being stored in one location. Federated analytics is very much similar to federated learning in that both do not require all the data to be stored at one location. However, while federated analytics is about applying basic data science methods for data analysis, federated learning is about training machine learning models remotely and getting aggregated prediction results back to the federated learning model. It would be good to say that federated learning is a subset of federated analytics.
Google in one of its posts on federated analytics described federated analytics as collaborative data science without data collection.
What are some of the challenges of traditional analytics overcome by federated analytics/learning?
The federated approach is useful because it eliminates multiple different problems with traditional approaches to analytical insight:
- Need to pull the data to the central location: The first problem federated analytics addresses is that data is no longer required to be stored in one central location. Data can exists in different locations/devices, and the insights are pulled into one central location. This provides greater flexibility when it comes to ensuring data privacy while also gaining the analytical insight that you need to run your business.
- Need to manage large data storage infrastructure: With federated analytics, one would not require having the huge infrastructure to store all of their data. Instead, federated analytics allows multiple different databases and information sources to be used together as a federated source that is still in compliance with any privacy laws or regulations.
- Need to ensure data privacy: In federated analytics, data resides in separate locations and federated analytics ensures that all data security concerns are met while the insights are still able to be derived.
- Need to have large computing power: Federated analytics allows federating data from multiple different sources. This means that you would not need a large computational infrastructure at the central location since federated learning does all of its work in each specific domain and then sends back aggregates to one central location.
What is federated learning?
Federated learning is a machine learning approach that works on federated data. It is part of an area in machine learning known as distributed or multi-task learning (MTL). Federated learning has also been called federated training, federated prediction, or federated inference. Here is a great comic from Google on federated learning. Here is a picture from the comic:
Federated learning enables the training of machine learning (ML) models across many devices without centralized data collection, ensuring that only the users have a copy of their data. One of the most important use cases of federated learning can be analyzing user behavior which can lead to better products while ensuring that the underlying data remains private and secure to the end-users by the virtue of data residing on users’ devices.
How does federated analytics or federated learning work?
The key to federated learning is federated data. Federated learning allows federating data across multiple locations/storage/end-user devices, which means that the machine learning training takes place on these federated data sources and not at one central location. The federated models can be trained without any of the data residing in one single location, so there is no need to extract all of this information into a centralized database before training the federated model. The challenges related to federated learning/federated analytics includes some of the following:
- Device heterogeneity
- Data heterogeneity
- Data privacy, and security on heterogeneous devices
- Applications on heterogeneous devices
In the federated learning paradigm, global model aggregation is handled by a centralized aggregate server based on local updated gradients trained on local devices, which mitigates privacy leakage caused by the collection of sensitive information. There are two data aggregation strategies such as the following:
- Synchronous aggregation strategy: The synchronous aggregation strategy in the classic federated learning paradigm cannot effectively use the resources, especially on heterogeneous devices, due to its waiting for straggler devices before aggregation in each training round. Furthermore, in real-world scenarios, the disparity of data dispersed on devices (i.e. data heterogeneity) downgrades the accuracy of models.
- Asynchronous aggregation strategy: In the asynchronous federated learning paradigm, there is no waiting required. This is why this paradigm is presented in various application scenarios to improve efficiency, performance, privacy, and security.
The machine learning algorithms used in federated learning are federated versions of standard machine learning algorithms. Some examples include federated mean estimation (FME), federated k-means clustering, federated least-squares algorithm, etc.
What are some real-world examples of federated analytics/learning?
The following are some real-world use cases for federated learning:
- User behavior analysis
- Generation of synthetic electronic health records using a federated GAN: Federated GAN can be used to create a data-set of “fake” patients through synthetic data generation (SDG) to circumvent usage constraints.
Federated analytics is an analytics approach that works on federated data. Federated learning is key to federated analytics. It’s part of an area in machine learning known as distributed or multi-task learning (MTL). Federated analytics has also been called federated training, federated prediction, or federated inference. This article explains what federate analytics is and how it works. We also provide some real-world examples for this technology including user behavior analysis and generation of synthetic electronic health records using a federated GAN to circumvent usage constraints. If you are interested in finding out more about the benefits of federated learning,please let us know.