In 2015, Deutsche Bahn (DB) launched a uniquely complex project for passenger information. Since then, we have been significantly involved, together with AWS and other partners, in creating a platform that distributes information consistently across all connected channels.
- Project: Big data platform for Deutsche Bahn travel information
- Duration: From Q4/2015, ongoing
- Methodology: Design thinking, Agile, Scrum
- Team: System architects, data engineers, DevOps, Scrum masters
- Framework: Amazon Web Services (AWS) cloud
Imagine the scene: You rush to the platform from which your train is supposed to depart, only to find that the display says something different to the rail company's app – which you only checked two minutes ago. If you've ever traveled by train, you've probably experienced a scenario like this or something similar, and wondered why the same information can't be displayed across all the information channels that a traveler might use. The answer is quite simple: Until now, the data was not fully accessible to all information channels.
In 2015, German rail operator Deutsche Bahn (DB) decided to develop a single, shared platform for travel information. The vision was to create a "Single Point of Truth" (SPOT) that would distribute information consistently across all connected channels.
The company put together a team of internal DB employees and external service providers, including us. The team was given the brief of using agile methods to develop a new passenger information platform.
The solution – a big data platform based on open-source technologies and micro-service architecture – is centered on technology. Via the micro-services, data from a range of sources is consolidated, evaluated, and then consistently fed to information channels such as platform displays at stations and the DB Navigator app.
Due to the enormous data volumes involved, the computing capacity required for near real-time data processing, and the need to access the data from any location, the team knew from the outset that the application would need to be developed and operated in the cloud. At an early stage, a decision was made to partner with Amazon Web Services (AWS), as its infrastructure and services were the best fit for Deutsche Bahn's needs.
The key objective was to implement a SPOT that would distribute information consistently across all information channels and touchpoints.
The starting point for this process did, however, present a number of challenges for our data specialists. The data needed to be acquired from a large number of different sources, some of which had very complex interfaces. Some of the protocols and data formats from proprietary solutions were outdated, which was a further complicating factor in the consolidation process. The team also needed to make provisions for technological consistency by incorporating an automatic update feature into the solution.
The *um Contribution
The project started in 2015 with a Proof of Concept (PoC), which was intended to show that the process of data consolidation could be successfully achieved using big data technologies. The PoC also needed to prove that all of the raw data types from different sources could be brought together to form a complete, consistent, and up-to-date data status at a single point in near real-time.
Our first task was to develop the relevant system architecture and then, based on this architecture, a timetable builder. This system generates a complete target timetable from the customer timetable and the operating timetable. Last-minute adjustments to the timetable and real-time data such as train position reports from track sensors were then incorporated. At the end of this process, we had our first consolidated view of all of the data generated from a number of completely independent systems.
After finalizing the PoC, the next task was to implement the project itself. The data platform we developed remained in place as the basis for reading and analyzing the data. We were responsible for the conceptional and technical design of the big data architecture and its implementation. Throughout the process, we adopted a DevOps approach and applied the Nexus scaling framework from Scrum, which helped us to resolve interdependencies and avoid integration problems.
The project team reached the first milestones that would bring tangible benefit to customers, including the ability to use the processed data to identify platform changes more reliably and at a significantly earlier stage.
In May last year, we achieved our first major project implementation success as a team to date: The ability to predict arrival and departure times more accurately for long-distance journeys. The combined output of big data analyses based on the platform we had developed and the machine learning and artificial intelligence components integrated by our colleagues were key to this achievement.
The game-changing milestone was reached in August, when the passenger information platform went live. During the pilot phase that followed, almost 50 stations were switched over to the new system before Q1 2019.
There is still a lot of work to be done on the project. The foundations are now in place, but the really "cool" features that exploit the full potential of the new system will be rolled out over the next few years.
Next on the agenda for this year is to complete the pilot phase and the rollout of the new passenger information platform. The system will initially be deployed at stations with less complex operations, but in the future it will be operated 24/7 at all stations in Germany, which will be challenging in terms of server capacity in particular. We are working hard towards a nationwide rollout so that passengers can benefit from consistent and reliable travel information through all channels.
The event-controlled architecture is based on the AWS Cloud in conjunction with the following AWS and Hadoop stack technologies:
- Apache Storm
- Apache Spark
- Apache Kafka
- Apache Kafka Streams
- Apache Cassandra
- Apache Hazlecast
AWS Partner Network