(USA) By 2022, the amount of data created, captured, copied, and consumed worldwide is forecasted to reach 97 zettabytes, compared to the two zettabytes in 2010. This unprecedented growth in the variety, complexity, and velocity of data has combined with increasing demands from consumers for more timely, trusted data to exert growing pressure on CDOs and data leaders to adapt.
Enter the modern-day data engineer—a role as hard to define as it is to fill. To deliver high-impact business value, organizations need versatile and flexible data engineers to enable the capture, processing, and use of data on a massive scale. The Department of the Air Force CDO, Eileen Vidrine, reinforces this point, “Data Engineers are builders at heart. Gartner did a report showing that 85% of data initiatives fail due to a lack of data infrastructure. Data engineers are critical. The Department of the Air Force is looking for Airmen and Guardians who have capabilities in multiple skillsets, such as agile delivery and open-source solutions like Pandas, data modeling, DevSecOps, and finding the blend of business process and deep technical knowledge.”
Developing a productive and thriving data engineering team hinges on the abilities of CDOs and data leaders to artfully choreograph the delicate interplay between people, processes, and technology.
In this hyper-evolving landscape, a core skill today may become obsolete tomorrow. The deluge of new tools and technologies means specialization no longer suffices. Instead, CDOs and data leaders require a new breed of data engineer with continuous learning and growth mindsets who fully embrace the organization's vision. “Data Engineers are sense makers. Personality traits of successful Data Engineers are: curiosity and an ability to work with the team,“ said Elizabeth Puchek, CDO of U.S. Citizenship & Immigration Services (USCIS).
These teams must welcome continuous upskilling, which requires motivation and a passion for pushing boundaries, and can only blossom through supportive leadership that actively encourages a continuous learning approach. “Applying the knowledge to practical challenges in your organization. Give them a specific project to test and apply their newly acquired skillset,” said Ms. Puchek. Data engineering teams need time to learn and experiment—and the safe space to fail. The resulting knowledge, innovation, and skills will deliver value to both clients and the organization. Ms. Vidrine highlighted an example of enabling innovation involving data engineers; “the Department of the Air Force regularly conducts datathons with Airmen and Guardians to use innovative techniques to solve complex business problems. A small team of Airmen in collaboration with the Massachusetts Institute of Technology Artificial Intelligence Accelerator developed a minimally viable product that is in production today which improved the scheduling accuracy of C-17 by 98%.”
To empower data engineers, CDOs and data leaders must:
Minimize Friction
Data engineers must contend with IT/Security teams' policies, dependencies, and inefficiencies, as well as outdated but institutionalized ‘90s waterfall methodologies. These barriers impact the size, speed, and scale of agile data projects.
Since data engineers are central to creating competitive advantages in curating data for value creation, CDOs and data leaders need to minimize IT/Security friction. Changing incentive structures will align the objectives of IT/Security teams with data engineering. This could entail developing an incentive bonus structure for both teams to deliver data products based on a predefined schedule and goals.
Accelerate Value Creation
Data engineers determine the capacity and speed with which downstream units can consume data to deliver organizational and customer value. CDOs and data leaders need to break down silos to create integrated data and business teams to ensure genuine collaboration and shorten time to value. Elizabeth Puchek further reinforces this in her remarks, “Clearly establishing the vision. You’ve got to get them excited. Let them know that what they are doing has a direct impact on real people and can keep them safe.“
Three waves of innovation have converged to change the way enterprises do business and the way data engineers work:
Cloud Computing. It is cheaper than hosting your own data center, shifts organizational costs from CapEx to OpEx, and offers scalability, resiliency, high availability, and operational efficiencies.COVID-19 and remote work have escalated cloud adoption to accommodate spikes in network and application traffic. Data Engineers need to have a modern cloud skill set, with in-depth understanding of at least one public cloud provider like AWS, Azure, or GCP.
DevOps/DevSecOps. This trend emerged via increased speed and flexibility of software development using cloud services.It allows developers to build solutions quickly and easily without purchasing and managing hardware, making code deployment relatively painless. The modern data engineer needs to be skilled in DevOps/DevSecOps tooling and uphold the culture and processes that enable them.
This approach can transform collaboration, communication, continuous learning, and improvement—and ushers in a suite of technologies and toolchains that foster these advances, such as distributed version code solutions (e.g., Git), infrastructure as code (e.g., Terraform and CloudFormation), and software automation tools (e.g., Jenkins, and Puppet). It also requires proficiency in DevOps tooling and related culture and processes.
Containerization. Data engineers play an important role in adopting and deploying this essential innovation in packaging and running applications. Containerization enables the provisioning and scaling of a container image in seconds and simplifies the movement of code from development, testing, and production in an automated (e.g., CI/CD), consistent, and reproducible manner. Combined with streamlined DevOps practices, containerization reduces bugs and failures, unnecessary hand-offs, and dependencies across teams.
Along with the operating system of the cloud known as Kubernetes, this has resulted in a massive shift in how we consume, build, and operate technology. It is also fundamentally altering the traditional data engineering role and must be reckoned with to enable a modern, agile, adaptive, multi-skilled, and collaborative data practice.
Adapting data engineering roles to the modern data landscape is a daunting task. CDOs and data leaders must empower their organizations to embrace new technologies and skills, budget for abundant upskilling, and implement frictionless processes to accelerate value creation. As Ms. Vidrine eloquently stated, "CDOs and data leaders need to build up the next generation of data engineers by allowing them to 'be the expert,' stay true to their technical roots, and have regular mentorship to be ready. Every day we should be asking….' Are they ready?' Because there are hidden capabilities that senior leaders need to cultivate."
(1) Statista, “Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025 (in zettabytes),” https://www.statista.com/statistics/871513/worldwide-data-created/#:~:text=The%20total%20amount%20of%20data,ever%2Dgrowing%20global%20data%20sphere.
(2) John Arundel and Justin Domingus, Cloud Native DevOps With Kubernetes: Building, Deploying, and Scaling Modern Applications in the Cloud, (O'Reilly Media, 2019).
For further insights on the people, process, and technology dimensions that CDOs and Data Leaders need to consider for modern-day data engineers, please contact Guidehouse at itstrategy@guidehouse.com.