Streaming Analytics: Generating Real Value From Real Time (Cloud Next ’19)

Streaming Analytics: Generating Real Value From Real Time (Cloud Next ’19)


[MUSIC PLAYING] EVREN ERYUREK: My
name is Evren Eryurek. I am the director of product
management for data analytics. And I am so delighted
to have all of you here. And I have two guest speakers
with me, Remi and Sergey, who are going to join us. They’ve been a
fantastic use case, and they’re going to share their
experience with all of you. Without further ado, I want to
talk to you a little about data and what’s happening
with the data. How many of you have
seen this study? By 2025, quarter of the data
created globally will be real-time in nature– a quarter of the data. That’s a lot of data, guys. And this is not much time. So we got to think
about what we’re going to do to be ready for
real-time data analytics. What’s driving this? Who doesn’t have a
cell phone right now? That’s good. No hands. So we’re all using. You’re taking pictures. Now, it’s getting processed. It’s going through the clouds. You’re using it
for buying stuff. I know I’m doing it,
and I know how I’m being utilized with my data. All the communications,
we’re doing it. Think about all
the manufacturing, what everybody is doing. All the data we’re
generating at all times– media, manufacturing,
enterprise, personal, shopping, what have you, everywhere. The common thread
is, I want to create a data-driven organization. Who doesn’t want to create a
data-driven organization today? We have chief data officers,
chief digital officers. They’re all tasked to establish
a data-driven organization. Let’s talk about
what that means. We’ll keep that in mind, right? What happens in this
data-rich environment? Who doesn’t buy stuff
online today, right? When you make a
purchase, you probably saw an advertisement somewhere. And then you decided
to go after it. You click on it, and then you
create some data right there. Either you bought it, or
you’re considering it. If you bought it,
you’re going to ship it. You shared some of
your information with whomever the
purchasing that you did it. On the other end, we have
the inventory systems in place value. We have the financial
systems for you, so you can make your
payments, so you can track it, and so forth. In addition, all the
other marketing systems are generating data around it. I know this. I’ve been buying stuff
for my younger son. He decided to build
his own scooter. I never knew anything
about the axles, different sizes of axles. Now, I know. And all the targeted ads
that I’m getting about it– this is a better
product, and so forth– I’m trying to help him with
that, thanks to you guys. What happened? They actually
recognized my need. They knew what I
was trying to do. They knew where I was
making my searches. They knew the types
of searches that I made from the categories,
all that information is in there, driven by the
events that I triggered. Take a look at the things that
I evaluated– this versus that. Take a look at the
decision that I made– what I
wanted to purchase, what I didn’t want to purchase,
and how I came to that point. And what I did
after I purchased, which happened three times– I bought the wrong thing. I sent it back. It was a whole
event-driven system. Who’s doing the
online gaming today? I am not one of them. My sons are. My older son, it turns out,
is one of the online gamers. He is racing. He is actually pretty good. He is not making
enough money yet, but he’s going to get there. And this is a big event. So what happens there? The time play, which
we limit how much they play during school days, but the
system knows how much time they are putting in there,
how they’re progressing. Is he becoming a
better driver or not? And his age, all
the IP addresses, all that information,
again, it triggers all the back-end
systems about what we do and how we deal with it. In this case, what are the types
of events we’re dealing with? Did he beat the regular bots? Is he really beating people when
he races online, real people? How much time is he spending
on that particular game? Or whether or not he is
talking to his friends– which many products
that they have online are enabling each other
to really talk to one another. And they decide what
they should buy. I know it, because
I get an alert. They just bought this online
using my credit cards. OK, how did that happen? But that’s all
driven by the events that we’re actually
pushing forward. So what happened to
my, I want to create data-driven organization? Maybe I want to create an
event-driven organization. That’s what we want
to get to maybe. That’s probably the path
that we want to go to. Or are they exclusive? If you look at it,
they’re not exclusive. One feeds the other one. I love the positioning in here. Take a look at what
we’re trying to do with the strategic vision
that actually drives the operational priorities. If I am the CEO, if I
am the Chief Operating Officer, of a
manufacturing plant, I know what I need
to be building. I know what I need. That will really actually
help me drive my priorities– tactically, operationally–
while I am actually driving my strategy. Those timelines are different. Those characteristics might be
common in some cases from use case to use case, but
timelines are different. Sensitivities are different. On the strategic decision,
I can actually take my time to do my segmentation. But when I’m actually
buying something, I have to make that decision. I have to make sure that I am
targeting the right person. Yeah, everyone was trying to
buy this part for the scooter. I’m going to pop that one
in here, because that’s what he is searching for,
not something for his wife. That’s the decision
that we need to get to. There are really interesting
dynamics going on in here. But both of the
decisions that we’re making– short-term
and long-term– are driven from
these sets of data. Time is very critical in here. Time is spent
between what you have decided to buy to what you buy. And how you can actually add
more value is very important. And we’re still dealing
with the same sort of data, but the scale is different. Decision-making is
a little different. How do we address this
ever-getting-smaller window of opportunity to turn
it into a right decision, so that purchasing is done? Decisions are made? Critical decisions are made? Some of these things
I’m joking in here. Scooter-buying is not
a critical decision. But if you’re running
a manufacturing plant, if you’re running an
enterprise, every second matters on every
decision that we make. Event activation is
all about the speed– speed of analytics,
speed of action. That’s what we’re going
to be talking about. This is where we’re going. This is what we’re seeing
throughout the world– how things are moving there. Now if you look at the
traditional platforms, why are we struggling? Why is the trend towards
real-time analytics, streaming analytics, decision-making
is happening, yet the current traditional
platforms are not able to do it? Well, one thing, we can’t
ingest the data fast enough, let alone processing
them, let alone really trying to really make
the readily-available data into the hands of
the right people. And consider all
the tool sets that are out there trying to
tap into the set of data to make a decision. They’re, at best,
disconnected, not really relevant to one another. So when you look at
the enterprise level– and I was a senior VP and CTO
of GE Healthcare at one point– and I know what a large-scale
system globally means. And how you make those
decisions are very important. And the need around
ingestion, need around having a unified system
for both batch and streaming is important. Serverless architecture
is absolutely important. I’m sure you’ve all
attended some parts of the serverless systems. And tools– having
the right set of tools for the right set of decisions
that you want to make is essential. And most importantly,
flexibility for the users, because we want
to enable everyone to become a data analyst,
a strategic decision-maker with the tool sets
that we have enhanced. Little bit about
what we are doing on Google Cloud around
stream analytics and how we are tackling them. Serverless architecture,
ingestion services, batch and stream processing all
unified, not one or the other, both with the same set of code. Comprehensive set of analytics
tools, from your ML tools to TensorFlows to
BigQuerys, you name it, and all the flexibilities
that you need. When I talk about
ingestion, some of you may have heard Cloud Pub/Sub. This is the largest
scalable ingestion system that you can find. And it is very global,
and I’ll share with you, and my partners in here will
share their experience as well and how they are doing it. Dataflow is designed to handle
both batch and stream needs, and you do it only once. You code it once. Use it for both of them. And that’s how we
are designing it. And we use all these internally
within the Google family as well. And for all your analyses,
you can use your Cloud ML. You can use your BigQuery. We’re making it as easy as
possible for you to leverage. And open source,
this is in our DNA. Everything we do,
all of our engineers work in the world
of open source. If you happen to be using one of
these– the Kafkas, the Sparks, and Flinks, and so forth– they’re all there. You can easily come and
start using our tools. They’re supported to help you
achieve your real-time analysis world. So on Google Cloud,
for your needs, when it comes to
streaming analytics, you’ll find every tenant
that makes it a reality for you– serverless
architecture, robust and global
scalable ingestion systems, unified streaming and
batch services, flexibility for all your users, and
all the tools that you need for analysis. Now some of you may
have seen this slide. If you look at it, on
the left-hand side, look at the amount
of time that we had to spend to add the value. We want to get to a place
where we’re actually providing insights. We are actually providing
some decision-making, not really tinker
with all the tunings, and do I have the right
set of data in here? Do I have the right connectors? Is it going to scale? Is it going to really have
the right memories in place? In some cases, you
run out of space. You run out of memory. This is all done in a
very scalable, global way automatically for you on
our analytics platform. All of them are achieved
through our serverless platform. It is key to your success. I mentioned Pub/Sub. Take a look at what
you can achieve. How many of you have
scales of 100 gigs of seconds for the performances
that you need global scales? And it actually
simplifies how you do it. And the way that we do it is
it integrates in Dataflow, so that it simplifies your
experience going forwards. That is key. And this is done in
every region globally. So you can actually
check that mark and say, I can access to my data. I can manage my events. I can do it at scale. Mentioned Dataflow. It is done in a way to unify
both batch and stream for you. Why is it important? Because we are already seeing
the shortcomings of the skill sets. If you had to have
a skill set, who knows how to deal with
batch versus stream, you’re not going to
be able to scale. This is done. It unifies it. Because batch is not going away. While the streaming is getting
faster and growing faster, batch is not going away. Batch is there. We want to make
sure you can take advantage of both with one set
of code, one set of tools sets, one set of skill sets. And it is fully integrated. I told you, we believe
in open source. Beam, I’m so proud
to mention in here, has won the Technology Award
this year for all the abilities it provided to the community. It is helping you build your use
cases with your language choice to solve your problems at scale. If you haven’t
started using it, I encourage you to
start playing with it. End to end, a comprehensive
platform from ingestion, to storage, to
processing, to analysis– this is what you will
get from Google Cloud when you start working on
your next-generation problems and real-time problems
and streaming problems. Some customers– and
there’s a ton in here. We’re going to talk about, in
detail, about AB Tasty in here, but GoJek, Ocado, High
Games, Recruit, these are examples just
what’s in the last six to nine months that
are popping up in here. Because they are
seeing, as soon as they start moving and bringing
their data and using our tools, they’re seeing the value. They’re freely sharing this. And I’m so happy to
mention them in here on the stage for all of you. We aren’t stealing your thunder. In the meantime, maybe Remi,
you want to come next to me? I want to introduce
my partners in here. AB Tasty, they’ve
done a fantastic job. Look at the scale, guys– 25 million sessions per
day, 50 billion events analyzed per day, and all
done in 32 milliseconds. This is a scale. Now, Remi is going to
cover what they have done and their journey. Thank you for joining me here. REMI AUBERT: Thank you
very much for inviting us. I and Sergey will try to
share valuable insight for you using Google technologies. To give you a bit
of context, AB Tasty is a platform for
marketing and product team in order to personalize
and experiment and to personalize and to test
on website or mobile apps. So we work like
web analytics tool in terms of collecting data. And we apply modification
directly on the website, so we need real-time
technology in order to act on client website. So we will try to share
this experience with you, on a business perspective for me
and on a technical perspective for Sergey, so if you are
technical guy, just stay here. You will have Sergey’s end. And I hope it will help you
to implement your technology. Just to give you the
global context of our work, I would like to share
with you the most powerful real-time technology
I ever built. That is to say, my daughter. You can be sure that if a
baby bottle comes around, you’ll know it instantly. If you take the joke apart, this
child will grow up in a world that we’ll already call the
fourth Industrial Revolution. So this is a world
where electronics meets biological aspect and
meets, as well, digital aspect. Ford used to say that
you can choose any Ford T as long as it’s black. The fact is, for the fpurth
industrial technology, we are the opposite of that. We are in a world,
where consumers have multi-touch point with brands. They have multi-touch points
through multiple devices– so desktop, mobile,
watch, et cetera. They have communication with the
brands through emails, phone, through chat bot as well. If your train is
delayed, you will send Twitter with the
brand in the tweet, et cetera, et cetera, et cetera. So all of those brands is to
build a one-to-one relationship at scale. Moreover, in this
revolution, the buying habits have changed a lot in
the last five years with the sharing economy,
with the subscription model through Spotify, through
Netflix, or sellers like that. So we have more
and more limitless access for consuming products. So things are going more and
more fast concerning payments. As well as loyalty
to the customer does not exist anymore. Customer you have
today, you will have to convince them
again in two years. For example, if there
is someone having a tattoo with Harley
Davidson in the room, just come to see me at
the end of the session. I offer you a
bottle of champagne. But those kind of loyalty
doesn’t exist anymore. And as well, a recent
study from [INAUDIBLE] told us that we have, as a
brand, less than one second to convince a consumer
or to get his attention. So this is the
context will leave as a brand or as a
company that sell product or as someone that want
to attract new customers. A lot of company are
not prepared to be or to respond to this endless
consumer, endless changing customer. So what we offer, the
companies that can change or that can adapt
will just extinct. Let’s take, for example,
Blockbuster DVD here in the US that disappeared with Netflix,
or let’s take the Borders libraries with Amazon. So those companies that
are not able to adapt to this new customer,
and those companies that are not able to make
modification every minute will disappear. We usually hear companies
that say, oh, I’m revamping my website. This is a two-year project. We cannot do that anymore on the
society or on the revolution we live. So there is a new way,
new manner of consuming. And we can say that companies,
like Netflix, Amazon, Uber have totally changed
the way we consume. And they have set
a limit, or they have set a minimum level for
customers that is very high. And all companies need
to adapt more quickly. Let’s hear a quick example
around those huge companies that succeed. They used to experiment a lot
with what we call the 1,000 experiments rule. There is a correlation between
number of experiments you do and your odds of
success as a company. This is where AB Tasty comes in. In order to help brand and
companies to better understand their user, to target them, to
act directly on their website, on the mobile app, to
analyze this traffic, and then to optimize
the whole process. So we help company to make
modification every minute, every second on their
website and to be the best at convincing clients. So for that, we need
to go in real time, and we need algorithms
that will automatically display the right version
for the right user. So here, I just kept
an example around let’s take a title of an article. We have to determine
in a second which article or which title will
be the best for a user. It’s exactly the same for a
visual aspect of a website. On a travel website,
for example, we have only few seconds to
determine if this user would like to have a travel
around luxury stuff, around sport, or
around cultural aspect, so we can adapt the
visual on the website. So for all that, we
implement algorithms that will display the right
version to the right person. I hope you all played this
game when you were young, but we do exactly the
same at large scale. We try to find the
right person and to put in front the right message. And we try to do
that in a second. If you take another
kind of personalization we do in real time, it
is to be able to display a number of products which
have been bought by the past and to convince people that
they have to buy this product as well, which is a personalization
around product and not around users. And as well, we can do
less technical stuff, but just inverting steps
in a process of buying or in a process of
subscribing stuff. Here is an example we did
with UNICEF around inverting two steps of their form. And we increase their
revenue by 18% with them. So to quickly tell
you about the company, we have 750 clients, 200
employees, six countries, 14 offices. We have 100% growth
yearly since five years, and we manage a
huge amount of data that Sergey will explain
to you in a minute. And we work with
huge brands in order to optimize their website
in real time, all day long, all year long. And that’s it. So if your logo
is not here, don’t hesitate to come to see me
at the end of the session. And we will go deeper in the
technical aspect with Sergey that will come on
stage to explain you how did we implement that
on a global scale level and how did we partner
with Google to manage that. Thank you. [APPLAUSE] SERGEY CHERNYAKOV:
Hi, I’m Sergey. It’s great to be here. And a little about me. I am from Moscow. I live in Paris. And I’ve been with
AB Tasty since 2015. I am also Hadoop
ecosystem enthusiast. I have 12 years of
experience working with back-end architectures. I work at AB Tasty as
the head of foundation. Essentially, I manage the teams
that works on all the data processing and infrastructure
aspects of our company. And as a part of
technical team, I am in charge of software
architecture and application development. Let me tell you about
the technical evolution of AB Tasty since 2009 and
what our next steps will be. So let’s dive into this. In 2009, what did the basic
technology stacks look like? Well, Remi had the idea of
developing an AB Tasting tool and wanted to move quickly. So we built our proof
of concept in one week. It’s a monolithic
architecture, with all of the application, database,
and infrastructure services on one virtual machine. At that point, we only
had five small clients, with a small load on
our infrastructure. We had a few more,
200 events per second, and we only collected
the data related to our clients’ complaints. So this approach worked
well, until around 2012, when business started to boom. At this point, two
things started happening. First, our clients saw
an increase in traffic. And second, their
expectation increased. They wanted higher
functionality. And most importantly, they
wanted real-time analytics. Our monolithic approach was,
therefore, no longer effective. And we started to think
about the right tools to satisfy all clients. We also started to experiment
with stream data processing. And at this stage, we
had around 200 clients and little more than
2,000 events per second. And so in 2016, we
tested different tools which would allow us to
improve AB Tasty’s product and increase functionality,
provide us with stability, given the increase
in client traffic, and develop real-time analytics
to meet client demands. We tested a range of
tools, as you can see here, including Sparks team,
Kafka, Cassandra. But we found that
they were not adapted to our needs or cost-effective. And so in 2017 and 2018,
we tested the Google Cloud Platform, and what
we really liked was its open-source approach
for me, the developer, and for my team. This is great in terms
of documentation, constant improvement, and so on. And also, from a
business point of view, what Google Analytics does,
it’s very close to what we do. Therefore, we can use the same
tools as Google Analytics, like BigQuery,
Dataflow, Bigtable, Pub/Sub and many others to reach
the same level of performance, which our clients like. So what is our approach today? Thanks to GCP and its
data-streaming services, we have changed our way
of collecting this data. And while we only collect the
data related to our clients’ complaints, now, we
collect all the visitor events, which enables us
to develop new insight. And so today, we can
provide our clients with some real-time
analytics and indicators. And finally, let’s take
a look at our next steps. First, we predict that the
number of events processing per second will increase 15-fold
due to the growth of AB Tasty. We expect to be able to scale
our infrastructure accordingly using Google. Second thing, we
have already started to process our data in
different regions of the world. But tomorrow, we want to
have the access points ingest and the storage
in different regions also close to where our
clients are based, such as South America,
Australia, China, many others. And third thing, at
present, we mainly provide our clients
with insight as to how to improve
their condition rates. And tomorrow, we want to harness
machine learning technology in order to improve this. Thank you. EVREN ERYUREK: Thank you. [APPLAUSE] So we have a little bit
of time, and I actually have some questions in
here for my partners. But we would like to
hear from you as well. So that’s why we saved
this time for ourselves to have some interactions with
you, your needs, and so forth. But to kick things off– and
I think there’s mics in here– please, if you have question,
please, just hop in there and ask the questions. And do we have any gifts
for the best question? We can think of one. We can definitely think of one. All right, but I do have one
question for both of you. So what I understood is the
personalization matters. And those tweaks
that you’re making, how does it change the business? How did it impact
from what it was to what it is today
and the results that you’re seeing
as the business CEO for your customers? REMI AUBERT: Yeah, sure. At the early
beginning of the tool, small improvement were very
impactful on client websites. So just changing the color of
a button was enough or changing wording was enough. The fact is, with the
maturity of clients, with evolution of
website, et cetera, we have to be more and
more impactful in what we offer to our clients and
in what our clients offer to their customer. That’s why we have to implement
new features like image match-making, which is
the effect of displaying the right image
to the right user. So we have to find a new way
of catching the attention. And I would say that the
increments that we could generate for our clients is
between 5% to 15% increase of revenue during a year. So that’s pretty huge. EVREN ERYUREK:
That’s pretty good. I saw that example for UNICEF. That’s 18% profit. That’s pretty good
for the decision that you guys made there. Sergey, you said, 2019 to
today, that’s a big change. The scale is impressive. So tell us about the volumes
of data that you’re seeing. How did it change? What happened? What’s the impact? SERGEY CHERNYAKOV: As I said
in my presentation, in 2009, we had our [? MVP ?] built
on monolithic architecture. Like data processing,
dashboard database was on the same server. And as the business expanded,
we built a relational database cluster to store visitors’ data. And we used this approach
for about two years. It’s not perfect for our needs. So we constructed now a
scale data store architecture with Apache Spark
jobs in batch mode that we host on
virtual machines. And we administrate
ourselves, and it’s very painful in
terms of maintenance. So one year ago, we migrated
over to Google and [INAUDIBLE].. REMI AUBERT: We had a
look yesterday with Sergey about the amount
of data we have. And we made a calculation
that every six months, we doubled the size of
the data storage we have. So the increase of volume
we ingest is very important. EVREN ERYUREK: Very important. Very good. So I highlighted several
times that batch and streaming and how having the same set
of code, same sort of tools is important. Can you share with
our guests in here what that impact was when
you used to do it with batch? Now, you’re doing it
real time with streaming, and you didn’t have to change
much of your code there. What was the impact
from business side, as well as from the
technology side? REMI AUBERT: From
a business side, it was more problems
than solution. When you do batch
and the time lapse, you have to process a batch. It’s less than the time you
need to process the batch. At some point, you are stuck. So we had trouble to
compute that batch and to give good experience
for our own clients. So that’s why we tried
to stream as much as we could all the
aspect of the tool. And in terms of personalization
that I showed you in the example, we cannot
do that with batch, as we have to be able to
categorize the user and give new user experience after
the first page [INAUDIBLE] the user. So in batch, we cannot do that,
as people will switch from the first page to the second
page in less than five seconds. So we have to calculate
that in real time. EVREN ERYUREK: I see. Anything from your perspective? SERGEY CHERNYAKOV: Yes, but
for the clients, I think, also, we cannot target our
visitors in real time based on their current behavior
when we use batch mode. It’s complicated. And from a technical
point of view, for me, batch processing is more painful
to administrate, to [INAUDIBLE] or to manage jobs, scheduling. And it requires also more
computing resource, because we process much more data at once. EVREN ERYUREK: OK. OK, well, thank you. [MUSIC PLAYING]

Leave a Reply

Your email address will not be published. Required fields are marked *