2020 Data Themes

It’s January, which means you’ll see a ton of predictions about what’s going to happen in data this year. It’s also 2020, which means a few daring people will take a stab at what the next decade will look like. Predictions are a funny thing - you need to be correct about what and when things will happen. Yearly predictions are mostly trivia and novelty. They’re fun, but kind of miss the point of macro themes that underlay many predictions.

At Ternary, we tend to think in terms of themes, not predictions. Last year, we identified some themes woven across various aspects of the data engineering world. Let’s first look at our themes for 2019, and see how they’re doing. Then we’ll take the bold (some would say foolish) move of laying out some themes for the ‘20s.

A look back at our 2019 themes

Our 2019 data themes post.

Our themes from 2019 were mostly on target. The themes were broadly in a few buckets - data is difficult, technology will continue simplifying and stripping away operational tedium, and privacy and ethics will continue putting checks and balances on AI progress.

  • Data will still be hard for most companies - Definitely correct. Data will continue to be difficult. This is a near softball of a theme for data. Despite the press about the looming adoption of AI, we find that most companies are barely doing proper analytics or have robust data architecture. No surprise, we consider the difficulty of data success a continued theme not just in 2020, but also throughout the decade.

  • Serverless will gain stronger adoption - 2019 saw advances in serverless such as AWS Lambda EventBridge and Google Cloud Run. Less managed infrastructure is usually a good thing, and IT managers will continue to migrate applications to serverless as it makes sense. We are curious to see where serverless data pipelines and infrastructure goes, as these tend to lag behind serverless application advancements. Especially in light of multi/hybrid cloud, it also makes sense that serverless will be around to keep cloud platforms sticky.

  • Containerization will accelerate multi/hybrid cloud - AWS Outposts and Google Cloud Anthos make it clear that clouds realize that customers want options. So, if you can’t beat your competition, make yourself ubiquitous and capitalize where you can.

  • DataOps will become more widely adopted - Still in the buzzword stage, but we see DevOps principles becoming more and more common in Data Engineering and Data Science. The AI Manifesto encapsulates our thoughts in this area.

  • Data lineage will grow in importance - Anecdotally, this is a growing trend. The market for affordable or open source data lineage tools is still early. We think this market will continue to develop, especially in light of GDPR, CCPA, and a general recognition of data privacy.

  • Ethics issues from data science will become a data engineering problem - AI ethics was the big topic of 2019. It’s just getting started. It’s incredibly difficult to have a conversation about AI where ethics isn’t part of the discussion. We think ethical uses of data will not just be a big topic for 2020, but a big topic for the decade as well.

What about 2020?

It’s pretty safe to say our 2019 themes in data will continue being big themes in 2020 as well, and probably beyond.

Here are some additional themes.

  • Data Scientists will begin fragmenting into more specific roles and titles. The term ‘data scientist’ is incredibly vague and is losing its meaning. Expect this title to become more meaningful. Since 85%+ of ML models don’t make it to production, it’s going to be harder for companies to justify paying data scientists a lot of money unless they’re seeing results. We think data science will become much more engineering focused out of business necessity.

  • Companies will get back to basics with their data. Anecdotally, we see many companies attempting to do AI/ML when they barely have functioning analytics. The challenges and importance of analytics will force companies to scale back their efforts and build a solid data foundation. There will be an analytics renaissance. Companies failing at AI/ML will realize they are suffering from poor analytics, and focus more on building up analytical clout in the form of better data pipelines, warehousing, and business/operational intelligence.

  • AutoML will continue supplanting traditional data science practices. Mostly because many problems can be decently solved using off-the-shelf solutions. For harder problems that require domain expertise, data scientists will be forced to focus on the "science" in data science.

  • Pipelines, pipelines, pipelines. This is the hard part. What's new in pipelines is that we can think of numerous interconnected services in the cloud instead of one or a handful of systems.

  • Data sharing - within and across companies - will become more popular as companies realize their data is valuable currency.

Our stab at the data themes for the next 10 years…

  • All companies will be data companies. It will be impossible to survive without the ability to use data in every process, most of which will be automated.

  • Smartphones - and most screens - will be quaint antiques. The internet will simply be...everywhere. 5G (and 6G and beyond) will bring AR/VR and ambient computing to the masses. If you think the smartphone revolution was interesting, you haven’t seen anything yet.

  • China vs Western AI - this is the elephant in the room that will shape the role of how AI and data are used in society and business. If you think the ethical AI debates are interesting now, just wait.

  • Data transformation will still be difficult for most companies. Although digital transformation came on the scene in the late 1990’s, look at how difficult digital transformation has been for most companies to this day (most fail - cite stat). Data is arguably more difficult, and companies that succeed will see great gains.

  • We’ll start seeing the effects of automation on the workforce, and society at large. If you think the recent election was interesting, you haven’t seen anything yet.

  • Blockchain and data will converge into a useful utility that will seem incredibly obvious in hindsight. This is the case with nearly every revolutionary advance in technology.

Will we be right? Who knows. A lot can happen between now and 2030 (heck, a lot can happen between now and 2021!). As Bill Gates said, "Most people overestimate what they can do in one year and underestimate what they can do in ten years.”

BlogJoe Reis