Federated learning with PySyft and TensorFlow Federated

Q: Does SCALE D2C work with all business sizes?

Yes — D2C brands to enterprise. View our pricing .

Federated learning — training machine learning models across distributed datasets without centralising the raw data — has matured from research concept to production deployment, with PySyft and TensorFlow Federated (TFF) as the two leading open-source frameworks. PySyft (now OpenMined's Python library) enables privacy-preserving ML with secure aggregation and differential privacy on arbitrary Python ML code; TFF provides a functional federated programming model for TensorFlow and JAX models with Google's production-tested aggregation protocols. This guide covers when to use each framework and the implementation patterns for enterprise federated learning.

When Federated Learning Over Centralised Training

The Federated Learning Decision

Federated learning is appropriate when: (1) raw data cannot leave its source due to regulation (GDPR, HIPAA), contractual restriction, or data sovereignty requirements; (2) data is naturally distributed across organisations, devices, or jurisdictions with no single party authorised to aggregate it; (3) the data volume is too large to centralise efficiently. It adds complexity (orchestration, aggregation protocols, communication overhead) vs centralised training. Don't use federated learning just for privacy signalling — if you can centralise data with proper access controls, centralised training is simpler, faster, and produces better models.

PySyft vs TensorFlow Federated

Framework	Language	Privacy Mechanisms	Best For
PySyft (OpenMined)	Python — any ML framework	Secure aggregation, DP, SMPC, HE integration	Cross-silo FL; healthcare/finance; arbitrary Python code
TensorFlow Federated	Python — TF/JAX	DP-SGD (TF Privacy), secure aggregation	Cross-device FL; mobile; Google-stack teams
Flower (flwr)	Python — any framework	Framework-agnostic aggregation strategies	Research; mixed framework environments; PyTorch FL

10–30%

Performance gap vs centralised training for federated learning at equivalent data volume — gradient communication overhead, non-IID data distribution, and partial participation all reduce model quality vs centralised equivalent

FedAvg

Federated Averaging — the standard aggregation algorithm (McMahan et al., Google 2017): each client trains locally for multiple epochs, server averages all client model updates weighted by data count. Implemented in all FL frameworks; adequate for most cross-silo use cases

IID

IID (Independent and Identically Distributed) data is the ideal assumption for FL — in practice, each client's data is non-IID (hospitals treat different patient populations), which requires FedProx or SCAFFOLD algorithms for stable convergence

PySyft Setup

Cross-Silo FL with PySyft

Install: pip install syft. Each data owner (hospital, bank, company) runs a PySyft Datasite server. The model owner sends a study request — a Python script defining the training code — to each Datasite. The Datasite operator reviews, approves, and executes the code against their local data. Only model updates (gradients or weights) are returned — never raw data. PySyft's approval workflow is designed for regulated industry cross-organisational FL. The model owner aggregates updates using FedAvg. Deploy each Datasite on the data owner's own infrastructure for complete data sovereignty.

pip install syftDatasite per organisationStudy approval workflow

TFF Setup

Federated Training with TensorFlow Federated

Install: pip install tensorflow-federated. Define federated dataset: train_data = [tf.data.Dataset.from_tensor_slices(client_data) for client_data in clients]. Use the built-in FedAvg process: iterative_process = tff.learning.algorithms.build_weighted_fed_avg(model_fn, client_optimizer_fn, server_optimizer_fn). Simulate training: state = iterative_process.initialize(); state, metrics = iterative_process.next(state, train_data[:10]). For production deployment, TFF provides a production runtime that communicates with actual client devices via gRPC. Add DP: wrap with tff.learning.dp_aggregator.

pip install tensorflow-federatedbuild_weighted_fed_avgDP aggregator

Federated Learning Implementation

Our ML development and software development teams design and implement federated learning systems for cross-organisational ML collaboration. Book a free advisory session.

SCALE D2C Editorial Team

Confidential Computing and P Research · March 2026

Frequently Asked Questions

End-to-end Confidential Computing and P strategy, implementation, and optimisation. Contact us for a free consultation.

Strategy: 4–8 weeks. Full implementation: 3–12 months.

Yes — D2C brands to enterprise. View our pricing.

Federated learning with PySyft and TensorFlow Federated

When Federated Learning Over Centralised Training

PySyft vs TensorFlow Federated

Frequently Asked Questions

Ready to Implement Confidential Computing and P?