Plenary session
24 May 2016
9 a.m.
CHAIR: Hello everybody. Welcome to day 2 of RIPE 72. I'm Mike Hughes, I am a member of the Programme Committee. I am the Chair for the first 90 minutes of this morning. I drew the short straw and had to go to bed early last night and Marcus is helping me down at the front as well. We have got three great talks. Everybody likes a little bit of network automation these days, so I am going to ask Mircea from CloudFlare is going to come up. The comment is just a little bit of housekeeping. If the fire alarm goes off. Run for the hills out the doors with the green signs and the other thing is obviously when it's time for questions, please come up to the mike and when you do please state your name and affiliation. Remember that the session is being webcast. Thanks very much.
MIRIEA ULINIC: Hello, I am a network engineer at CloudFlare. Probably most of you already heard of us, we are trying to make a better web.
We in the network team believe that a better web implies also a better Internet. Our network carries the traffic for a percentage of between 15 and 20% of the total Internet users. With this said, if we can improve our network, we can have a direct impact over that web and we can have a pretty good slice of it improved.
Today, I am going to present to you some tools that we are currently using to improve or network. They are Open Source, you can as well use them. And we can, together, work and improve the web.
I want to make my point clear from the very beginning. This is still a work in progress, the project is not yet mature.
Just a couple of things about us. We are a CDN. We are also a company for less than one month we are also origin certification authority. We serve more than 40 billion DNS queries. We serve more than 40 billion DNS queries on a daily basis. All this being supported by a global network consisting of more than 80 locations, and this is growing. Till the end of the year, we'll have more than 100 locations. This growth comes with a couple of challenges for us.
Deploying new POPs is pretty clear it's a challenge. It is even more challenging when you do that manually, and you have a target rate of deploying two new POPs per month. Human errors factors is another challenge. We have humans, we have done mistakes, we will forever do mistakes, but the machines don't do mistakes if they are instructed properly. I attended a presentation a couple of times ago, he had a very interesting point of view and I can just totally agree. If you have automation, if you automate things in your network, you can keep your errors consistent and you can fix them afterwards. While human errors are inconsistent and unpredictable. From time to time we have a need to replace equipment. When you replace equipment from the same vendor, this is, let's say, fine, you replace a couple of thousands of config lines, you adapt them to the new device and that's all. Imagine when you have a need to replace a device from a different vendor. That becomes a nightmare.
We have a vital need to monitor our network. It's growing, you can lose track of important details from your network. Also, in monitoring you can detect wrong things from your network, then you can fix afterwards. This is why we decided that we need network automation at CloudFlare. First logical step is to choose the right automation framework. This is why we listed a couple of requirements for this framework. We wanted it to be various calibre, because we wanted it to be able to manage the network devices for the current network, as of today, but also the future network which will be much bigger. Our network engineers are now distributed across the world and they need to access the resources of the network at the same time without any errors or conflict. This must be easily configured and customisable and adapted to your specific needs.
From time to time, in order to overcome the human error factor, we need to verify the configs and enforce them. For monitoring, collect the statistics on the network and to cache some important details. Also, it was very important for us to not reinvent the wheel again and again. There are many frameworks that provide you with tools already done. It's pointless to write, for example, the same driver that allows you to connect to a database again and again, like many others did before.
On the network side, there are a couple of available solutions. Mostly used are Ansible, Salt, Chef and Puppet. After them the mostly used seem to be Ansible and Salt. Chef and the Puppet have some features, but their list of features is a bit shorter. So from now on we'll analyse Ansible and Salt.
Before going any further, let's have a look at some opinions of two people that worked with both Ansible and Salt. They have experience, and Ryan Lane says "That the learning curve for Salt is higher and the introdocs are rough, but in the long term Salt's docs are a much better than Ansible's because they are way more complete." I cannot say better than that. In the beginning, it was a bit harder to get used with the logic of Salt and the files, after all it just comes naturally. When I say "in the beginning", I say pretty much the first two or three days.
[Yans] says something very interesting as well "To me, Ansible was a great introduction to automated server configuration and deployment. Moving forward, the scaleability, speed and architecture of Salt has it going for it."
I think we can apply the same pattern also in our world. We have been introduced already in this world of network automation. I think it's now time to make it scale and fast. It's time to make one step further.
Wait, Salt, although seems to be a very good solution for a server side, in our network world, it's not considered. For example, the first document from Cisco reminds something about a couple of possible automation frameworks like Puppet, Chef, Ansible, and somewhere at the end an ETC which I doubt contains Salt as well.
The second document is from Juniper, that excludes from the beginning any possibility of using Salt, and it is entitled automation of Chef /Puppet and Ansible.
Why is that? Why on the server side Salt is a very good solution and in our world it's not? If you go back, you can see that the programme from Juniper is completely outdated. It has been published more than one‑and‑a‑half years ago, and 18 months in our world is a huge amount of time. In this time, many things can happen. One of the things that happened is that Salt introduced all the features that you need to control your network devices.
It's surprising that many people don't know about that. But, that's fine, you can't know all the possible details. And I'm asking you now, how many of you know that you can control network devices with Salt? One person? Two persons? I expected that. But, this is why I'm here today, to show you the truth. To show that that you can control network devices with Salt and you can do it very well. What is even more surprising is that many people didn't want to know that. And the right attitude, I think, is that you have to be open‑minded to all possible solutions and choose what's the best for business. No matter the name, no matter if it's called Ansible, Salt, or whatever.
It is true that Salt is a very powerful beast. It comes with its cost os layer, it's preferable to run it on an external server, while if you want to make network automation for a couple of visible machines on your computer, controlled by a remote server running Salt, it might be just a waste of resources.
We have experience with Salt. We use it for years and as of today we control many thousands of servers and having it for the both sides also for servers and the network devices would be simply great. Moreover, it comes with a bunch of benefits for us.
Let me draw a parallel between Salt, what is, what fits the best our needs and what is mostly used as of today in network automation, being Ansible.
In Salt, you open the connection with the device just once, at the beginning, and then whenever you need the resources of the network, you just go and get them. While in Ansible, you open the connection with the device. Issue a command or a set of command, then you close the connection. For us this doesn't make much sense. It's like you open a terminal, type show interfaces, close the terminal, open it again to show BGP neighbours, and close again. This can not scale. Moreover, in Salt you can have 20 types of modules all customisable. In Salt we also know that it's extremely escapable. As I said it manges many thousands of our servers around the world. It comes with many embedded features and tools that you can just use. You don't need to reinvent the wheel again and again. It has a network enforcement logic that I talked about to overcome the human error factor. It comes a little features that you can ‑‑ you see there in the both sides. The difference is that while in Salt you can have them for free, in Ansible if you want realtime job, job scheduling, RSTPI or GIT, SVN, you need to buy the Ansible tower, and it's not cheap.
As I said Salt have 20 types of modules all customisable. The mostly used are the execution modules that basically they send the command to the device, and return the output.
The gains help you make more granular selection of the devices you target. It is simply not enough to select your devices only by the host name. We all know there are vendors that provide you a set of commands that work properly on a specific platform, specific version on a specific operating system. When you upgrade, you might find out that those commands do not work any more. This is why grains are so important. They help you select only those devices that you you know that those commands work. The states come with that config enforcement logic. In the return err, inside you take the output of unlimited modules and mix them together and display the output you want. Pillars, you need to know that the configuration files in Salt are called pillars. Salt has pushed even more that on the barrier and those config files are dynamic, when I say dynamic, it's not a ginger template or something. I say dynamic being that the file itself and its contents are built dynamically. Us, for example, will build those files dynamic from external GIT server. We never wrote those configure files manually. The returners help you forward the output of the modules in an external service, like database and so on.
As I said it comes with an embedded modules that you can just use to communicate with no skill or databases or chats or brokers and so on. The same with returners. You can forward the output of the modules in a different service. And there are extremely easy to be used. In the end you just band dash‑dash return and the name of the returner. For example, in this ‑‑ on this slide, you can see that just appending dash‑dash return SMS will forward the output of this device and will send you a message on your mobile phone.
Let's have a quick look of the architecture of Salt. You have a server that is running an instance with the master. You need to install on this server a software package called Salt master. On that master controls a couple of devices. Communicated with them via a very fast communication channel based on a zero MQ. On this minions is installed a software package called software minion. The problem is that, as you know, on some embedders still don't allow installing custom software packages on their devices. This is why Salt comes with this solution that allows you to connect to, to control your devices, connecting through a very good library. This is a feature is called the Proxy Minion, which is basically emulates a real driver.
The disadvantages of this approach is that Salt forks a process per device. That being said, if you have a network of 100 software devices, you will have a server with hundreds of processes forked.
For us, the good library that allows to connect to the devices is NAPALM. NAPALM basically is a wrapper of drivers for different operating systems. It's a very fast‑growing library. On the left side, you can see the list of the features available at the beginning of February this year. On the right side, you can see the list of features available now after CloudFlare began contributing to this project, and will continue contributing.
Having that Proxy Minion feature on the one hand and the NAPALM library on the other hand we mix them together and we announce here today at this RIPE meeting our Open Source recipe called NAPALM Salt. With this said we have integrated NAPALM in Salt and from now on when you install Salt on your machines, it comes with all features for you to have network automation. Basically, you have network automation now in two steps: Install and use. That's all. You don't need to write one single line of code. We have introduced a couple of execution modules like net that allow to you retrieve the Arp table, the Mac address table, and so on. BGP that allows to you retrieve the configuration of the BGP statistics. NTP, that gives you configuration of the NTP and synchronisations. Probes gives you the result and the configuration of the RPM or SLA probes. For the last two, we provided also states and now with a very simple ML file you can control variously the configuration of your network devices.
Let's have a look of some examples. When you execute a module in Salt, you begin your command with Salt. Then you select your devices you target, you can select also by host name but not only the exact host name. You can apply a regular expressions. In the first command you execute traceroute on all edge devices. In the second command you select only the devices running JunOS and you execute show version on the CLI. The third command retrieves the Arp table from all switches running an XOS. The fourth command retrieves the Mac address table only from those devices running IOS XR. The fifth retrieves the RPM probe results from all Juniper MX 480 devices and the sixth sets a list of NTP address from all routers.
This is an example. Executing the module net and retrieving the Arp table just using the target device edge 05, it retrieves the list of entries from the Arp table. You can note this dash dash output equals Jason. With this Salt has a couple of embedded returners that help you render the output, the normal output in a different format that you want. There are many other formats available like Jason, and so on.
Salt helped us to have abstracted configurations. No matter if it's Juniper or Cisco now, all configurations look the same. You can see BGP.neighbour and then say the details like IP, group description and so on. Maybe it's not obvious enough why this abstractisation layer is so important and let's have a look. Supposing you have a device with 1,000 BGP peers. This device is manufactured by a vendor A. At some point for one reason or another you need to replace this device with a different one manufactured by vendor B. What happens? This will be the most network engineers now trying to manually configure and bring up all 1,000 BGP sessions. And this is us. We just need to update the file and say hey, now is no longer vendor A, from now on it's vendor B. Job done. That's all we need to do and Salt works hard for us now.
Behind the scenes there is one thing you need to do to check that. Let's have a look at a simple example. You pound define the list of NTP peers in a pillar file. Then you schedule a job. This is how Salt knows to plan a job in the Cron tab. This job is called NTP underscore config, for example. This is only for you to know to identify what state has been run. The function is a state.sls that this is now ‑‑ this is how a state type module will be executed. Which state type is a router.ntp. When? Every day.
This is an output from when after a state was run against a device. It says that an NTP peer has been removed, two NTP servers have been added and three others have been removed. If you go back, you can see also underneath the NTP underscore config, says returner. That being said, the returner SMTP will take the output run against all devices you targeted, it puts it in an e‑mail and sends to us. Basically Salt goes now every day and checks for us if the configuration is okay. If it's not, it will fix it by itself, and, at the end, will send an e‑mail to us notifying this is the summary of changes I made.
What else you can do? It was in March when a colleague asked me for the list of unique ASNs we are peering with per geographic area. It just took me a couple of minutes to build a runner. You can notice that the commands begins now with Salt dash run. This is how you identify that a runner was called. Inside the runner, as I said, you put the output of unlimited modules and unlimited types and all the others. You can select the devices you target using the grains or the pillars, and then you throw the output on the console or you can also use render or return store it in a different service. It's remarkable that the excuse time is only 2.8 seconds. That means this runner took a ‑‑ collected the information from all devices around the world and displayed this in only 2.8 seconds. How cool is that?
We are started finding stuff in our network, as I said, you can't simply get store in your memory so many details. Searching using a different runner using net dot find for a random stream is just easier. It says that it found that pattern, description of a couple of network interfaces. Searching with a Mac address it says that it matched the physical address off an interface. Searching with a different Mac address says that it found an entry in the Arp table of X 05 in Copenhagen.
We are having now a different tool to search where we have BGP peerings with some ASNs. Just type in bgp.neighbours and then the list of ASNs for example search something for the ASNs of Google, Amazon, Facebook and Twitter says that we have peerings with them in Dublin, Tokyo, London, ASN Sao Paulo and so on. We started monitoring or network. We run every two hours now traceroute between all our callers and we monitor our transit negotiating. It was network. It was very easy to set up where to store these results. Just to say the host and the port and then the traceroute will be run every two hours. Again, with another schedule instruction. It has populated in this instance with many thousands of traceroutes. As you see, between edge 01 in San Jose and London via Tata. Between Ashbourne and ASN ‑‑ San Jose ‑‑ sorry, Frankfurt and Seattle and so on.
Why is this important? Whenever we have now an issue, we can compare the traceroute now when we have the issue with the last traceroute before the issue happened. And it's much easier now to see if the traffic has changed the path or if a node begins to reply with a higher latency. Even more good news for you. You can just install all these awesome features. From now on you just need to install Salt master, either using the attitude command or or just follow the install guide and then you need only to install the underneath library NAPALM. I provided some examples on the link at the bottom of this slide, just follow the link and you'll see that it's very easy to set up your environment.
Now, we will continue contributing to this these two awesome projects. We will contribute with things that cover our needs. We can't cover all possible needs in this world. But, the things that help us, you will have available. By any chance there is something else that you need that is more specific to your business and is not here, and you have the power to write code, please contribute to those two upgrade projects and help others after you that might have similar needs to yours.
If you have any questions or you need some help or advices, either on how to install, how to use or how to contribute to those two projects, you can e‑mail me or my colleague, Jerome Fleury, or find us in a slack chat, follow that link to subscribe. There are two rooms that might interest you, they are called Salt stack and NAPALM.
If you have any questions?
AUDIENCE SPEAKER: Jen Linkova. Thank you. Very interesting. I have one minor comment and question. So you mention that Salt gives keeps the connection open, right? I'd just like to comment in some case it is might be a problem because unfortunately there are devices ‑‑ that there are a very low number of allowed connections. And the question is, it still works great when you already have your network properly described in abstract model. Could you share your experience to the migration to that stage from the situation where you have a number of devices around just configured and now you have to get them into a stage when you only type the abstract model and you people not go into devices and can configure something and go away without updating the model?
MIRIEA ULINIC: You don't need to update the model. The model will update the device.
JEN LINKOVA: Let's say you have an engineer who decided to trouble shoot something and he decided to jump on the device and make some changes and go away, right?
MIRIEA ULINIC: I didn't want to go into so much details. You will restrict access to the devices and everything will be mostly configured from those files which will have restrictions on those files. They will be stored in a GIT server in a remote GIT server and whenever you need to make an update to the config, you need to make a request. And this request is reviewed by a couple of persons. This way I can also have a history overview. Then when this pool requested is merged, there's a trigger that says that hey, this file has been changed. Now update the configuration of the device. And having the files in a GIT server, you can manage much more easier the risks restriction that is you can alloy by user.
JEN LINKOVA: Basically you do not allow changes on the devices themselves.
MIRIEA ULINIC: No.
JEN LINKOVA: One question, how do you make sure that the change you are going to push from a repository is safe?
MIRIEA ULINIC: Because it will be reviewed by a couple of other people.
JEN LINKOVA: So, always push just one change and merge two changes together and push the result?
MIRIEA ULINIC: Depends. It's still an ongoing project, we haven't experimented that yet so much. So, we'll see.
AUDIENCE SPEAKER: Hi, I am Alex Band from the RIPE NCC, I have got an online question. Three parts from Marco from DENIC. He would like to ask, how do you connect all of your devices with NAPALM? Are you using netconf over SSH? Paramiko or something else?
MIRIEA ULINIC: As I said, NAPALM is basically a wrapper of drivers. There is drivers like Buy Z that uses NETCONF. There is others using SSH and others use yes, as he said, underneath somewhere at the bottom, Paramiko.
AUDIENCE SPEAKER: Okay, so, concerning Paramiko, do you also recognise performance issues in conjunction with large answers like for example, BGP full feed or table dump? And last but not least, thanks for NAPALM Salt and the execution models, they are totally awesome.
MIRIEA ULINIC: As I said NAPALM is a wrapper of drivers. And we don't manage that underneath the libraries. I don't know much about Paramiko. The devices I experienced didn't need connection using Paramiko. So I can't answer these questions. Paragraph meet owe is used by the driver for IOS devices and we don't have any IOS, so I don't have an answer for this question.
AUDIENCE SPEAKER: Okay. Thank you.
AUDIENCE SPEAKER: Hello, Andrea from GRNET. Can you scroll back to the side abstraction in the two routers. Okay. In the left, it is the configuration of BGP neighbour, if you have two BGP neighbours you repeat this twice, three, etc.? You put the configuration with Salt. What happens when you want to remove a neighbour? How does Salt handle this? Is it able also to remove a configuration
MIRIEA ULINIC: The pillar file is authoritative. Salt checks to be only and only what is in that pillar file. If something does not appear in the pillar, it will be removed. If something new appears, it will be added.
AUDIENCE SPEAKER: Benedikt Stockebrand. I have got more experience with Puppet, and mostly servers but when it comes to networks, there is also this nasty chance that you make one wrong move and you can't reach the difficults you want to manage any more. Really nasty with network devices, much simpler with servers. Do you have any contingency plan sort of things besides doing review to avoid this step in the first place or are you really screwed when this happens?
MIRIEA ULINIC: Because, this is not yet fully pushed into production, we didn't have yet such signed of issues. Thank you for your suggestion. We have to keep that in mind as well. But, as I said, all of the files will be reviewed by a couple of persons, not only one, and at the end, when you do that manually on the device, cannot dot same mistakes and it's only one pair of eyes doing this mistake.
AUDIENCE SPEAKER: Well, first thing is, if you use a tool like this, you won't screw up one router or whatever, you'd probably up a couple of thousands and ten thousands whenever things go wrong. You can review as much as you want, you won't see the stupid typo, especially when IP addresses and things are involved. So my suggestion is really, find some sort of contingency plan whatever, several management network, whatever it is, so when these things happen, you can actually get to the point that you can do an updated Salt run or whatever, and get the systems back to work. It can be really scary when those things go wrong in the wrong place.
MIRIEA ULINIC: Yes, I don't expect it to be perfect. So... you.
AUDIENCE SPEAKER: It's just with network devices it's much harder than with just regular servers where all these tools come from.
MIRIEA ULINIC: Yes, you are right. Thank you.
CHAIR: Okay. Thank you very much.
(Applause)
HOSSEIN LOTFI: Hi everyone. I am Hossein, I am here on behalf of Google technical infrastructure. Last time I was at a RIPE meeting was 14 years ago so it's really good to be back here. I have packed a lot of slides, so apologies for high rate of words per second.
What I want to start with to give you an overview of SDN evolution in Google and talk about how we move data sets around. If you look at this chart it shows the challenges. Most of the invasions and the reason that we got into this business of SDN was driven with this factor that just in six years you can see that the traffic that was generated by your servers grew up by 50 X. So we did have to come up with technologies, most of them not available outside of the company, to be able to handle this kind of traffic. If you look at this timeline, it will show you some of the technologies that we actually publicly talked about, there's a paper that talks about them. But the timescale is interesting. I'm going to be mostly focused on Juniper in this talk. But I encourage that you you have time at the end there are links to these papers, take a look at them as well if you haven't already.
So, what happens inside our networks is that in cluster fabrics we have some sort of tour aggregation, we have edge aggregation blocks and spines that connect these together. All of these technologies talk about different topology and connecting these different stages together. We started back in '04 we had a simple four course design, and again the scale on the Y axis is important here that shows how we grew from one terabit per second to early in 2012 we introduced Jupiter that can do 1.2 pet bits per second. It earned its same due to scale and size. It provides 40 gig connectivity to the servers. It has external control servers. And it's driven on OpenFlow with some priority attributes.
Some of the characteristics of these fabrics and I want to emphasise here that the same way that we do for computation, Google does not use super computers, we go and deploy commodity servers and we get the power just by the scale and sheer volume of the deploying these systems. Same applies in the networking world. We are using commodity network gear and chip sets. The chip set that is in Jupiter chances you are using it as well. Which brings an interesting attribute that these are shadow buffer switches and routers, we do have tiny round‑trip times in Jupiter due to the multipath nature of these links, you have massive multipaths. It helps with availability, makes some of the other aspects of the network interesting.
And one thing I want to emphasise is that we do have one common platform. All Google services internal and external they run on the same fabric. If you are a Google Cloud customer, your packet is travelling on the same physical infrastructure that g‑mail packets are travelling from.
We did expand our SDN to our private within a network. The code name is B4, there is a dedetailed talk about this if you search B4 on YouTube, there is a talk that discussing how this network is laid out. But one thing I want to emphasise here is that, which is kind of an interesting aspect of this SDN network for me is that the host is also part of the SDN network system. The host is actively talking to the SDN controller, reserving bandwidth. Requesting traffic and they are kind in agreement that how the packets should be marked and the controller is also dictate how the host should rate limit the traffic.
We do have one virtualisation layer on to much everything that I described. You can create a slice in this network and then one private networks there and get all the attributes of DDoS and all that stuff. It's called Andromeda.
With that, let me tell you what's happening and what our current focus is, because the story ‑‑ this was kind of the past story, but it hasn't ended and we're continuing in this SDN world to overcome the following challenges.
So we come up with three different, if I want to summarise the waves of Cloud computing and for this audience is should be fairly straightforward, is that, I mean, the last decade people are realising the value of virtualisation. We called is Cloud 1.0, where you tried to virtualise more and more of your applications. Then what's happening today is Cloud 2.0 and people are moving services in the public Cloud and they are not very much focused on the hardware, you can basically get hardware as a service. What we think is happening next and that's the big challenge for Google TI at this point, is that we think that we will have to get out of the business of delivering servers and virtual servers in the Cloud and we have to deliver compute. There are challenging applications like machine learning, and we want to be able to get to a point where you don't have to worry about the memory and storage and the CPU in those systems, you just want to get your machine learning algorithms done and we want to be able to provide that in the Cloud.
These are the challenges that we have today in order to make that vision a reality. There's a lesser‑known law about parallel computing, the Amdahl's law which says where every 1 megahertz of compute power you are going to be looking at 1 megabits of value. Which is scary. First of all, that means that the fabric has to be in perfect balance. You can't have part of the network congested and the other side sitting idol. You are losing resources. The other challenge is if you apply these to say you have a server that has 642 .5 gigahertz of CPU in, it say you have 50,000 of these, which is not even the total size that Jupiter can handle. Then it comes a really scary number. That means that these CPU you have to access just the CPU to storage traffic you are looking at 5 petabits per second. If you are over‑subscribed you still come up with 500 terabits. The same applies to the latency. If you want to access Flash or NVM, the latency spectrum has to be within 10 microseconds to 100 microseconds. That means there's no room for buffering. There is no room for packet loss, congestion, retransmission. The fabric has to be non‑blocking and always delivering these packets a hundred percent of the time.
Availability: These are two figure megawatt facilities. We can't turn down the entire facility to add or expand or do repairs there. So, the system has to be built in a way that you can change wheels on the fly, right. And these are the challenges that we're trying to overcome right now.
Eventually, the vision is to make the network disappear. You want your computes, nodes and resources being able to access storage across the fabric without having to worry about any of the network characteristics. My take of this vision is that, basically, speed of light should be the only limiting factor between your compute and storage.
All right. So, with that overview, let's jump into the telemetry. We have talked about five different generations of fabrics. You have multiple switches in these fabrics and the story becomes interesting because you have different types of customers in these fabrics. You have g‑mail which has certain attributes. You have voice traffic, you have external facing Cloud traffic. What we do is that, there's no way that you can look at just the network devices to get an idea about how the network is performing. The state is scattered into multiple stages. Sometimes you have to probe into the applications or stags or what's happening inside the switches to understand how the network is doing. And given the size of Jupiter, it's practically impossible to get an idea about the visualisation of this, right. So I have prepared three different attribution for you and there are three different telemetry applications that I want to talk about. If you look at the life of a data centre, the way that it's deployed. The first phase is building the data centre. We'll come up with different models as a result of these tools. These models, they can give you an idea about the topology, but they can get very detailed and they will give you the actual wiring instructions between the two points on the network as well. Then it comes to once you deploy this network, then it comes to connectivity. You want to make sure that the entire fabric has routing connectivity. And at the end, it all comes to operating and making sure that we deliver what the network promised. In each one of these faces I want to talk about three different verifications that we want to do. In the build, you want to make sure that the network is built quoting the topology that we initially estimated. We wanted it to be. And the connectivity you want to make sure you don't have loops and black holes in the fabric. When it comes to the operation you want to make sure that you are meeting SLA.
That seems simple, but everything gets so compounded by scale. For the topology verification we're talking about 250,000 links. More than 10,000 switches in a fabric. For routing and connectivity, it's quite common to have more than 10 million routing entries in these fabrics, and every little incident in the network can easily create a burst of up to 30,000 routes in the fabric. On the SLA side you know the numbers, that things about YouTube and the demand that they have.
Let's get into those three examples. To verify that the topology is ‑‑ the work is done according to the topology. One of the things we do is that we say okay, we can't trust the SDN controller yet. You want to make sure that the topology is deployed correctly. What happens is that we do have a kind of a source routing mechanism in our networks that is independent of the SDN. The idea that you mathematically calculate that if the entire network sends a traffic ahead of the time based on the way that the source routing is set up, you pre‑calculate how the graph should look like, then the entire network sends a full mesh both ways and you see what are the responses that I'm not seeing in the return traffic. And by deducing what's missing there, you can actually narrow it down and get to the point that you say okay, this link is miss wired, or the switching is not behaving the way it's supposed to be. And on the routing side, loops and black holes, although they seem rare, but in this size of fabric we have to always be conscious and make sure that it's loop‑free and there are no black holes. So what we do is that very consistently, we take a snapshot of the SDN controller and by itself it turns out to be an interesting large data set. We run a map reduce on these things and the key to dot map reduce is based on destination subnets. We ask all the nodes in the map to come back with, do we have a loop or do we have a black hole? And then at the end you reduce all data sets. This is ‑‑ actually, the initial set of it is a little bit expensive, but once you do the initial calculation, because you do differential snap shots, you can detect loops and black holes within one millisecond of them being introduced in the network.
On the app level SLA side, there is so many different types of levels and measuring of SLA. What I want to focus on is the end‑to‑end SLA that we have in our network and that turns out to be the one that most of our internal customers are interested in. They don't want to know how our switches are performing, they want to make sure they have connectivity across fabrics. What happens is that we spin up a number of probes in the network, in certain data certainty, in all data centres actually, and these probes start sending beacons to each other. It's like a dial. You keep turning this dial to make sure you have a hundred percent coverage in all the aspects of the network, and then, again, the result that comes back will tell you if you have lost free connectivity across these fabrics. There's ‑‑ in the interests of time, I'm going to skip this, but there's actually a lot of interesting data in the paper that we published about the initial size of this dial and how big should it be and how fast you should rotate it in order to get a complete picture of availability.
This part ‑‑ I'm glad actually Salt explained that because it actually ties into the same challenges that we had. Now, you have a data certs, we talked about the sensors, how do we move this data around in the network and how do we know what are the telemetry beak once that the device is supporting. SNMP, for this audience, I should probably skip this side, you all have seen the pain that's attributed to SNMP. It seems to be in desperate need to be updated for today's telemetry needs. The automation has come a long way, we have got from just SSHing into devices and grabbing Snapchats and doing one‑off things to building these automation frameworks that you know longer, your application, your high level application no longer has to be worried about the type of device or the connectivity to that device. We do have one effort that Google is part it have but a lot of you guys are already part of it as well, to move towards a vendor‑neutral model driven config rather than being drivers that knows how to talk to a certain device, we like to be able to come up with a way that all the manufacturers, all the vendors can support a common configuration model and you just push these configuration model to the devices and you don't have to be worried about the way they parse these things or the kind of questions that you guys ask from the previous talk.
That's OpenConfig, that effort, the list of operators that are participating in this is growing. We are coming up with ‑‑ it's a very collaborative environment. Operators come up with their way, their needs in the network and we try to add it into the model and there are some vendors that are already commit and have released the OpenConfig support. A day in the life of OpenConfig, I don't have actually snapshots like previous talk, but you can see a day in the life of OpenConfig looks like this. You have these models. You can bind Python classes and then you programme it, you can describe these models, once you save it, you can push these configures to the devices. Let me give you one example of let's draining a link. What happens is that the operator, you provide an API. The operator updates a model, the network topology model, and you grab these network topology models, translate them into OpenConfig models and you basically push the OpenConfig model directly to the device and the device knows how to deal with that. They expanded the OpenConfig beyond just the configuration, and now they support telemetry. The model that it was agreed upon and it's being deployed in the OpenConfig world is a streaming model. Let me give you actual example.
The devices, they broadcast the type of telemetry signals that they understand. And you can actually ask the devices to subscribe to streamers, they keep streaming the interesting data out that you will mention what you're interested at and they keep pushing these up in the the collectors and then you can have message brokers and the rest of the charts I'm sure you already have similar ways in your networks where you can just extract time series data out of it. The underlying protocol can be and what we will be encouraging is http 2, it will create a two‑way channel that you can push config as well as retrieving data, and, because it's already on http 2, you can manage the compression benefits. So it seems more efficient on the wire.
Three list of three vendors that have committed but that list is growing. Not only on the switch gear, we also have some interest on the optical side of the networking gears as well to expand this. So, and we would like to ask if you haven't done already, check out the OpenConfig.net. It's an open community and folks are very welcoming new members, if you have ideas on how all this should look like, or if you can talk to your vendors and see if they are interested in moving in that direction.
Finally, when it comes to moving the datas around, we do use gRPC, that's what Google internally uses for a lot of things not just telemetry. It's an Open Source version of our internal RPC mechanism, and we encourage ‑‑ there are others as well but we encourage that you consider gRPC when you are moving telemetry data out of these devices.
All right. To summarise it. I would like to emphasise some of the key take‑aways. Thing out of the box means that you basically slice your box into different plains and you see what SDN can do for these different stages. We talked about analytical and sensors in the data centre and how they become ‑‑ simple sensors become challenging when you are deploying them at scale. Sadly in a lot of scenarios that we explain, humans are almost useless because the scale of the network and reacting to these changes is just beyond human capabilities. We introduced OpenConfig and I ask you guys to consider it and gRPC as a transport mechanism. If I got you excited about Jupiter, there is two ways you can see, it there is a 360 video that the link is here, you can see that in VR, or, of course, the second way you can see Jupiter is you can join our team which is called Cluster Engineering, if you are interested.
All right, that was my last slide, and with that, if there are any questions...
CHAIR: Any questions? Or is everybody just stunned? I was depending on you guys to have questions. I've not got one. I have just failed. Have you got one, Marcus? In which case, if there are no questions, then ‑‑ oh, there is one, John Curran.
AUDIENCE SPEAKER: John Curran, ARIN. So you detect ‑‑ you have this robing probe you spin up all over the network and you can detect once you do the reduction, you said within a millisecond, you can detect a routing loop for example.
HOSSEIN LOTFI: Yeah, 1 millisecond.
AUDIENCE SPEAKER: So ‑‑ it takes a millisecond. Does that alert someone or does it actually do something about it?
HOSSEIN LOTFI: It does something about it. You take advantage of the SDN, when you detect failure you can instruct the SDN to react.
AUDIENCE SPEAKER: So you have cases where you are actively correcting things, algorithmically based on the detected loops?
HOSSEIN LOTFI: Correct.
AUDIENCE SPEAKER: Wow. If that goes wrong, it goes wrong really quickly.
HOSSEIN LOTFI: That's the case with automation, if automation goes wrong then you can break things really really fast.
AUDIENCE SPEAKER: I was impressed that you had automation on that.
CHAIR: There was a really good talk on network automation at the NetNod meeting from Job Snijders and it's worth looking up in the webcast, this was at the last NetNod meeting which was in March, again it was the whole sort of thing of yeah, and I did this and I built this thing and it automatically went and destroyed my network, I regret this now. Again, it was the other thing, when the automation breaks how do you go and recover things after it's obliterated? Obviously you get a garbage in, a garbage out.
HOSSEIN LOTFI: Every time we come up a new automation idea we have to go through these reviews. And the person who created the SRA model, Ben Traynor, he commonly asked this question, what is your blast rate, if automation goes wrong, how far you can below things out so we always have to be conscious about limiting that.
AUDIENCE SPEAKER: Dimitris Kalogeras from NTUA. My question is associated with the previous question. So, are you saying you are not having any kind of routing protocol to detect a routing loops? Lot will the these are OpenFlow sessions. So we always, we have to rely on... so the switches do not decide anything about how the network packets should be routed. It's all decided by the SDN controller.
AUDIENCE SPEAKER: In any case, I mean, so, everything is done at a central point you say
HOSSEIN LOTFI: Right. Right.
AUDIENCE SPEAKER: Jan. Based on the previous question from John Curran when things go wrong. The usual quote from even [Pepelyak] is, let's make our mistakes consistent across the network.
AUDIENCE SPEAKER: Hello. I got a question from Sebastian from Nors Network. There is an option to use Ginger 2 templates. Are these available? I imagine they're huge.
HOSSEIN LOTFI: I believe they are, yeah.
CHAIR: If there are no further questions, I'll close the mics, and I'd like to thank Hossein for giving that fantastic talk. Then invite our next speaker to the stage. It's a familiar face, it's Shane Kerr. He is going to talk about the Internet, what is the problem. I'm a complete Luddite on Internet things. Now, I don't want like an electronic key fob and I don't want to have to yell at Alexa to turn my lights on.
SHANE KERR: So, this is be a radical departure if you saw my talk yesterday.
So, as the title of this talk is, it's IOT, Internet things, what is the problem? And sub‑title is how to explain to your boss that IOT won't make the company rich. And this talk was actually one that I put together for my boss, because I am a DNS guy and he came to me and said IOT, what are we going to do? How do we solve this problem? And so I started thinking about, well, is there a problem. What is that problem and so on. So to be clear, I'm not an IOT expert. So, these are just my musings about it based on reading that I had done, talking with people and things like that.
So, I think we can all agree if nothing else about IOT that it is one of the latest buzz words right now. And it has all the nice things that we like about buzz in the IT industry. It's kind of new. It's not super new, it wasn't invented yesterday. So it's not too new that people haven't heard of it but it's new that it's not like anyone knows anything about it. It's technological. But it's not really hard. Everyone kind of gets it. So that's great, it fits in this nice space. It's growing. So, business people are interested in it because they like anything with growth. And it also fits into this buzz realm because you can kind of ‑‑ if you squint and look back far enough and you don't think too clearly about, it you can really imagine it being basically whatever you want. And, of course, no one wants to be left behind and not be part of this great new thing. So, that's kind of where it comes from. And IOT completely new and different, bosses are interested in, it it's not at all like web 2.0 or virtualisation or SDN or Cloud computing or open stack. We could go on for hours talking about the next new thing that we all had in our careers if you have been involved with this for more than like a year, so...
So, anyway, I have been talking a few slides here and not really explaining what I think IOT means. There are of course many different definitions, some more formal, more informal. The one that I like is on today's Internet we connect people together and tomorrow's Internet connects things together. That's basically it. And of course, today's Internet also connects things, especially in the kind of infrastructure space that we work in, but conceptually it's kind of a qualitative difference. We don't expect shoe lays to say talk to each today, but maybe tomorrow they will. Like I'm untied, me too. Anyway...
So, trying to figure out what the problem is. Let's think, what are the questions? The first question that I had as an old school Internet guy is, will it scale? And since I have been involved with Internet for many years now it's always been a question of like, we're constantly reaching limitations of the technology and of the infrastructure that we have. On the other hand, so far, we have been managed to overcome all the challenges that we have had. We went from E TC hosts to DNS. We went from classical pour to CIDR and now there's ten thousand times as many computers now as there were in the past. And so IOT may mean ten thousand times more, well it doesn't seem so horrible. How do we scale today?
There is two basic types of scaling. One is horizontal scaling. This is just adding more boxes basically, so you need to make sure your technology can be ‑‑ the capacity can be increased by adding more hardware. This is ‑‑ the approach taken by services, most web providers use this kind of approach, things like that. It's not always trivial to do that. You need to design your things in a way that can be scaled up just by adding more and more stuff. For example, SQL databases are difficult to scale that way because of their asset requirements which is consistency in how the transactions work. Financial things are difficult to do this way because a trust anchors needs to happen at one place at one time and things like that. But of course, because we need to scale up horizontally people have investigated SQL, we now have block change people are looking at for scaling. So any time there is a limitation, again we try and find technologies to say work around them. This kind of horizontal scaling has been known for many years to not work at Internet scale for everything. What we need is hierarchical scaling and you may not believe this but I drew this diagram myself on a piece of paper.
So, the real way that we scale beyond horizontal scaling is with hierarchies, and this is an old school computer science technique where as basically you have a tree, or an inverted tree. We use a tree, old computer science technique, and we apply this to the network in various ways today. That's how IPs work. We all know this, it's how far the DNS works, we all know this, as far as we can tell there is no obvious limitations to how well this is going to scale. So, for me, conceptually it seems like these technologies will work, hierarchical scaling will work even at the Internet of things scale.
So, stepping sideways a little bit. What are the three fundamental technologies on the Internet? Addressing, IP addresses for us, routing, BGP at the Internet scale, and then names, which we use basically to mean DNS.
Well, right now moving from IPv4 to IPv6 on the addressing side. That should be enough for IOT and any other purpose for new addresses that we can come up with before I retire, which seems good enough.
And there is a bit of hierarchy in there. But we can evolve how we use IPv6 if we need to. From the way I was thinking about this, looking at it, considering addressing in IOT, IPv6 should do it. We don't really need anything new, there's no real problems here.
BGP, it's a much crappier picture, I'll be honest. BGP usually typically we mean for external. On the internal side we don't have to scale up to the Internet scale. We could use BGP internally, some people data networks but it's not necessary. And from where I at the Internet, BGP is currently the weakest link in this whole picture. But it's been known to be a weak link for as long as I have been working on this stuff and it seems to keep working somehow. It's not great but it does seem to keep working. Maybe we'll need to look at other techniques like maybe SDN or something, I don't know. This is the biggest question park from my point of view as far as scaling to the IOT.
On the naming space, we have DNS. It's of course not the openly way to name things. There's been many names in the past. There will be many more names in the future. But, DNS has the nice property that's hierarchical so it scales quite nicely. And, of course, DNS is old and has its own set of problems, it keeps me employed recollect that's good. There's security problems. DNS is hard to change. Things like that. And of course, throughout the years people have proposed alternative systems which have different properties. None of them have become super, super popular. Right now, DNS remains the best. Keep using DNS, keep me employed.
Right. So the answer to my question is, will it scale? Addresses? Yes. Routing? Probably. BGP sucks, but it seems to work okay. Maybe it will keep working. And naming, definitely I think yes, DNS will continue to scale.
Well if scaling isn't the problem, maybe I started thinking about the questions wrong, what are the actual problems? And so I picked a few that I think everyone can agree are problems. The first is compatibility. And as always, as long as there have been companies, companies have tried to achieve vendor look in. It creates isolated systems. It creates walled gardens, there are papers that ISOC has put together talking about this problem. Today, the Internet currently has a lot of proprietary protocols. Even with the obvious benefits of open protocols. So, I don't think in the IOT world this is going to be any different. Vendors will constantly strive to trick and fool people and companies into using their proprietors' products and get stuck in these vendor lock‑in walled gardens.
Another problem that IOT really has is obsolescence. So, this picture here, that's my solar water heater, it comes from an era before solar panels for electricity were economical. It was installed in my house when I bought it. I'm not exactly sure when it's from. I think around 1998. The vendor is still in existence but they don't make this one any more, of course. It's not networked. It doesn't fit in the IOT world but it continues to work great. So, this is non Internets of things. This is things. Right.
In the traditional world, appliances last many years, and ‑‑ but then you think about the modern world and you think how old is your phone? I know people who do have very old phones. I know a guy who recently had to get rid of his flip phone because the calendar didn't work past 2015. But, most people have a smartphone and it's usually a year or two old. I dropped my phone on Friday before I was coming out here so my phone is exactly four days old.
So, right now, we don't have any good solutions for how to update old hardware. It's not impossible, of course. There's no law of physics which says that you can't download new firmware for old hardware. NASA does stuff like this, the military, the airline industry has equipment that runs for decades, right. So it's not impossible. But that's not really the kind of stuff that people think about when they think about Internet of things. Part of the problem here is economic models are mismatched. When you buy a piece of appliance now today like, something to wash your clothing or dishes or things like that. Usually the vendor makes their money when they sell it and then they may make a little bit of money in after market stuff like replacement knobs, but basically they are done. That's not the model that we use on the Internet. The model that we use on the Internet is constant vigilance and eternal maintainance. I have an IP TV which doesn't support NetFlix, it won't get new firmware. Maybe there are things that can be done. Maybe we can try to push Open Source. Have open platform standards and things like that, which doesn't match well with the previous slide about the desire for vendor lock‑in. So, I'm not at all confident that we have a good picture in obsolescence, which is why it's one of our challenges.
The next one of course is security. This relates to the previous slide. As I said the current best practice is just to patch the bugs that we have discovered after we have got things in production. It mostly works today. I think maybe some security people would argue that we are losing the war and it's not working at all. Nevertheless, people are able to use the Internet and derive value from it. I can file my taxes online. I can pay my bills, I can order a taxi and things like that. So, it basically works, this whole patching bugs things.
But the unanswered question I think is can it work for unattended systems? We are talking about Internet of things. Again, things talking to things. These things are not managed by people. They are unmanaged. Can this work? And I think we just don't really know. And as I said before, there are some prior examples to this. Space probes, you know, there's scientific researchers who will leave something monitoring temperature or humidity out in the woods or on a lake and they do update this stuff. So it may be possible. New standards may help in this area. But again, I think this is actually an open question and a big challenge.
So, getting back to the bosses of the world, who will come to you and say so, IOT, and what he really means, or she, is what they really mean is, can we get rich with IOT? And from my point of view the short answer is no. Just no. You are not going to get rich with IOT. I'm sorry, I see very many sad faces in the audience, that's not the message you wanted to hear. But the longer answer is yes. All you have to do is find some people who are willing to spend money to fix these problems, the ones that I documented there, compatibility, obsolescence, security, engage with them, solve their problems, do this for lots of these people over a long period of time. You can make money with a business. That's just the way the world is. It's not secret sauce.
So, maybe the final answer that I would tend to give people, again you are not going to get rich, but you can make a successful business and you can have fun doing it and you can make the world a better place. Which, I think is what you should always be striving to do.
So that's my short talk about IOT, and that was what I intended to give before I came to the meeting. So I have had a few talks with people since then, and I have evolved my thinking a little bit, and so I put in a bonus slide which is my conspiracy theory side. So everything that you saw here before is true and I stand by those observations, those statements but there may be a different picture that we should consider as well. And this comes from a cultural and political and societal way of looking at things, and if you have been around the Internet for a while you probably know that there's been a longstanding battle, I'll call it a battle, between the telcos and the IP folks, so my very first job was working at a telco and then I moved to the IP side, and there was tins mutual disdain, the telephony people thought the IP people were hackers and didn't know what they were doing and just throwing things together with string and gum. And the people on the IP side thought, well these telco people are more concerned with process and guidelines and restricting competition and not getting anything done. Personally, I think everyone was right. And from what I understand, RIPE itself was started partially to get around these restrictions that telcos were putting in place. And over time, basically, IP won. IP telephony, TV over IP, IP everything, the telephone systems still exist and it's still important and, to be honest, more people have a mobile phone than have Internet access but what they can do with that is make a phone call and send an SMS, when you compare what you can do with the Internet and all the value that the Internet has brought to us as a society not only from an economic and technology logical point of view, but from a quality of life and bringing people together point of view, the Internet has totally defeated the telco system in its race.
Telcos still don't like the Internet. It's more Proxy‑Arp to say they don't like the open Internet. Of course they can like the Internet because they can make money. They don't like the fact that it's not something that they run. And I think governments are in very much the same situation. They have a love/hate relationship with the Internet. Of course, they love that their citizens are happy and wealthy and healthy and all these kind of things but they would much rather they did it in a controlled, constrained less scary way than the open Internet. And of course like I mentioned a few times earlier in this talk, vendors are not also super happy with this open Internet idea. Of course they love the Internet because they get to sell a lot of kit. Except for the ones that went out of business. Basically vendors like selling stuff but they want to be able to continue to sell stuff without having to spend time thinking of new ideas or researching things. It's an expensive and uncertainty idea of having competition. What are we going to do about this problem. From my point of view from the fundamental point of view, IOT doesn't bring many problems. It brings specific challenges because of the new way it works. If you think about the problem as being we have a free and open Internet. And that's the actual problem, then the Internet of things becomes a much more interesting solution. And using the word 'Internet' in Internet of things is actually kind of a trap and a trick and basically a lie. What we really want is a network built together using 5G which is doesn't exist yet. But 5G is really popular and exciting because here is our chance to build a greenfield network which we can use that will require, for example, a SIM in every object, or an eSIM, these kind of things. And it will require new proprietary network equipment, new proprietary protocols. It's a glorious new age of propriety closed systems. If the problem statement is we want to do away with the open Internet. Maybe IOT is the real problem and the real answer.
That is the end of my presentation. Are there questions?
CHAIR: The mics are open. Thought‑provoking. Thank you.
AUDIENCE SPEAKER: Benedikt Stockebrand. There is another issue to this. The IT industry, for the last couple of decades, has trained people not to really trust this stuff. We have, the case of experience with blue screens, we have had advertisement claiming Internet access guaranteed availability up to 99%. Basically, about 90 hours, which basically means two business weeks, and it also means you buy this fancy new door lock that you can open with a smartphone and at least guarantee twice a year you return from work and you won't be able to get in. If you combine these numbers. And a lot of people, for very good reason, have strong suspicions about this, and of course the other situation is, oh, yeah, my smartphone is six weeks old and I need a new battery, sorry, you need a new one because that thing has been discontinued like four‑and‑a‑half weeks ago, that sort of stuff. So basically, unless we accept that people buy a new fridge, like, every 20 years ago or a central heating system, or whatever, and adapt as an entire industry to this sort of thing way of doing things rather than the Internet sort of thing where we change things every four weeks, people will be very, very sceptical for very good reason for quite a long time unless we can actually change the way we work as an industry and get kind of more mature, this will be a very difficult thing. It will be very exciting if we get to this point, but general acceptance will be difficult. If a TV doesn't work it's annoying enough, but if the heating shuts down like we had with these Nest Thermostats where they deployed updated mid‑December and for January 1st ‑‑ well, part of the people live close enough to one of the Poles, or especially the North Pole in that time of the year, to decide this is uncomfortable.
SHANE KERR: So I have a question for you: are you at all confident? Would you trust devices, even devices that you built? I'm not sure that I would.
BENEDIKT STOCKEBRAND: If I had a pacemaker, no, I wouldn't trust it. But there are a couple of other things where the risk is not half as bad and that's probably the way we have to go. We have to start with the stuff that's not so critical.
SHANE KERR: I completely agree, yes.
AUDIENCE SPEAKER: Randy Bush, IIJ. Do not talk about security problems in the future tense, or do, but let's not forget the present tense. Bernhard mentioned a couple, but medical devices that don't encrypt. The list goes on. The Nest Thermostat fiasco, automobiles that can be jacked remotely. The security problems of IOT, it's hard to underestimate. It's really hard to underestimate. And the effort going into understanding the problem is vastly under scale. The effort to view the question as are we even talking about Internet protocols, as you say they are sliding it into proprietary stuff where they will bury it under the covers. It goes on. The security stuff will be vast and horrifying.
SHANE KERR: I totally agree, Randy, and I think the companies, the sea level people who are involved with designing strategies around this cannot even begin to conceive of the increased attack area and the massive risk that they are about to dump on society, or have already started dumping on society.
RANDY BUSH: They have seen some of it. The manufacturers light bullet manufacturers who think they are renting the light bulb instead of buying it.
SHANE KERR: I'm not sure they understand what it means. If they are renting it, then that means that they are liable to and I don't think they have gotten through all that ‑‑
RANDY BUSH: I have a saying: when you do not think of having your customers' data as an asset, it's a liability.
SHANE KERR: Absolutely. Absolutely. Yeah.
CHAIR: Can we take Marco and then we'll take Sander and then we'll take that one.
AUDIENCE SPEAKER: Good morning. Marco, RIPE NCC, one of them. Continuing where Randy left off on security. On your security slide you mentioned new standards. In that sense would that then mean technical standards or are we more looking into new or basically altered behaviour or are we looking at human standards when it comes to security, and in relation to your last slide, does regulation play a role here and the easiest example here is that data leaks starting next few months are going to be a very expensive affair in the EU if you get caught.
SHANE KERR: So, I'm a huge fan‑boy of [] [Brusch Snyer] and he reminds us that security is a process and so, I think anyone who thinks about security of course, it can't be just a protocol standard, right. It can't be a set of best practices. It can't be a set of social norms either. I think it has to be all of these and my own belief is that it also needs to include a legal framework, and I think we are so far away from a comfortable political environment today where we could achieve a reasonable set of legal standards, I really don't even see a path for us getting to there from here. I'm not a politician and I don't know the details of regulatory markets. But I do see, like everyone does I think, all the conflicts that governments have trying to decide very simple things like how ICANN should run. Now, move that down all the way to figuring out how IOT should work with plumbing across different standards, different market forces, different political environments, I think lawyers are going to have their work cut out for them on IOT alone for the next few hundred years, so...
AUDIENCE SPEAKER: That's the trick to get rich then. Become a lawyer. Remember remember become a lawyer, yeah.
CHAIR: That's not a new trick though, is it?
AUDIENCE SPEAKER: Hi. Sander Steffann. You mentioned compatibility as one of your issues. I think it has been hinted at already, I think it also should be like dependencies. My thermostat is dependent on the server of the vendor. I have had my house be insanely hot because I couldn't turn the heating down because my thermostat's vendor's server had a problem.
SHANE KERR: Mission accomplished.
SANDER STEFFANN: What I think is really important if you do IOT stuff in your house that you can be independent of your vendor, if your vendor decides not to support your product, you don't want to turn it into a brick and I don't know how to solve it, but I think that's really important aspect.
SHANE KERR: That may be something that has to be solved with legislation. I hate to say that because I think ‑‑ especially in this community, we hate Government regulation. I think the Internet community is very anti‑Government. I think possibly coming from its origins as an anti‑telco thing where regulation was all designed to restrict the Internet and still continues to restrict the Internet. We saw yesterday a presentation about the situation in Africa. But I think ultimately we probably can't count on vendors to do the right thing based on the goodness of their heart.
There is another related issue about, you mention your home setup and wanting to be able to control the devices in your homes and things. I think we need to be a little bit careful about thinking about the models about how IOT stuff is deployed. Right now, we have a model where you have a home gateway and a lot of relatively smart or dumb device that is talk with that and manage a specific area. I think it's a logical outcome of the way that we connect to the Internet now. We have got a limited number of addresses, we tend to get one IPv4 address and we have a router that connects to that and manges all of our other devices. It's a very natural mapping for us to then go from that model to just swapping out network device to say other devices, right. I think there are other more interesting and powerful models that we could use if we could have had more flexible architectures for how devices talk together. I haven't thought this all the way through. But you can imagine, different amounts of controls and information from IOT devices depending on who owns the building versus who is renting the specific property, I am thinking of businesses and things like that. I think we need to be careful about not just easing into this kind of way of thinking about things, because that model of having a single gateway and a bunch of devices connected to that works great for vendor lock‑in and I think we need to be careful about encouraging that. So...
AUDIENCE SPEAKER: Hi, Brian Trammell, ETH Zurich. Can you go to your last side again. Can I see the slide after this one, because this one is really depressing. There has to be a side after this one. So this is a little bit of ‑‑ I'm putting you on the spot a little bit. Thank you for the talk, this is good food for thought. And I totally want to steal this talk but it needs a slide after this one that says how it makes it better. And the discussion here is kind of like getting there, and you know, that's a little bit of on the spot question, you know, what's your suggestion for the next slide? And maybe that's something that you can do to update this talk after looking at you know this discussion, but... I'm sad now and I need coffee.
SHANE KERR: It's not too early for beer. It's afternoon somewhere. I don't have a good answer for any of this stuff. As I have mentioned, I think a lot of these problems are just hard problems and they require societies and everyone to think about them, which is going to be hard. I mean people are busy and they have their own concerns in their own professions and their own focuses. Getting politicians or people who own hotels and things like that to think about the, frankly, semi‑technical details about how this stuff works, is a distraction for them. I don't have any good answers. I don't know.
AUDIENCE SPEAKER: One of the take‑aways is there is a bunch of net heads talking about invoking legislation, which is interesting to me. I'm just going to leave it at that. Thank you.
SHANE KERR: Sorry, don't cry, it will be fine.
CHAIR: As far as the beer goes, it is always after five o'clock somewhere in the world.
AUDIENCE SPEAKER: Scott Leibrand. Affiliation in this case is OpenAPS. Benedikt mentioned being afraid of doing IOT stuff for pacemakers. My wife has Type I diabetes and we built her an artificial pancreas. This is remotely controlling insulin delivery, which is a lethal drug, 24/7 with Internet connected device. Yes, there are security risks, there are bigger risks of having diabetes. This ‑‑ the comments about Government intervention being required and such, there is a place for Government to play, but from my experience, they are better at screwing things up than making them better. I'll give you an example. The medical device that we use to control the insulin delivery ASN old one because the new ones were locked down for security reasons.
SHANE KERR: That's a very good point.
AUDIENCE SPEAKER: Cannot actually control insulin because people were worried about getting killed without anything newer than a 522 metronic pump. There is a lot of unintended consequences of security. I think a better approach for this community to talk about what can we do with DIY and Open Source solutions? How can we get vendors to provide us access to the devices in our homes so that we can control them ourselves, or in the case of medical devices, get the data. There's a lot that we as a community can do and as the hash tag goes, we are not waiting in medical devices and I think there's an opportunity for everyone here to participate in making the Internet of things an actual boom for everyone not just for vendors.
SHANE KERR: Thank you. Can I respond to that because I think that was a really interesting comment and I think that could involve a whole panel discussion of its own, the confluence between the hacker/maker community, versus corporations and how Government gets involved with that. And I think that's a really good point and one that I actually, to be honest, had, frankly, overlooked. Nevertheless, I think what we're seeing in this case is a combination of governments and vendors, these two evil people here working together, for reasons that they probably believe are good and are plausibly good also, they sound reasonable, but in fact can be restrictive and lead to worse outcomes. So thank you very much.
AUDIENCE SPEAKER: Josef. And this was really interesting because yesterday I was promoted to be Chairman of the board of an IOT company. Actually I would like to make a comment more with my ICANN hat on. You are of course aware that this is exactly the way our good friends in Geneva ITU are trying to make themselves relevant again.
CHAIR: Can we take a comment ‑‑ these guy resource have been waiting a really long time.
AUDIENCE SPEAKER: Fredy Kunzler Init7, you mentioned 5G protocols but you didn't mention a long range plan. So, when we speak about walled garden versus open Internet of things, then I think we really need an open network as well, because I'm not keen of the thought paying some mobile operators which operate a 5G network for every device, I'm going to run in my home or in my life in the next five years. So, that makes me to state really, we need to put the community effort to build up this open, long reach band network.
SHANE KERR: Thank you for that comment. Someone mentioned that to me yesterday as well, that there are community networks, people trying to get connected without these managed corporate networks, and I think that's great. And I'm going to be investigate ago little bit more of that for example with the refugee hot spot that I talked about yesterday, but again from the point of view of a vendors, people getting connected without paying anyone is a problem not a solution. So...
AUDIENCE SPEAKER: This will be quick. Niall O'Reilly, I am a scared pacemaker wearer. My battery has about 11 years, maybe 12 of residual battery life. I think I'm going to have an interesting conversation next April with my cardiologist to start the planning for 11 years time.
SHANE KERR: Well, if I see you in the future and you are wearing a tinfoil jacket, I'll know what's going on.
AUDIENCE SPEAKER: Patrik Falstrom. NetNod. I was not supposed to walk to the microphone because I'm at least as scared as you and share your concerns, but I was triggered by the wording about IOT and I would like to expand a little bit to this audience because you are not will be aware of what's happening. ITU is doing the work in six‑year cycles and they just started the discussion on what they should do for the next four years, that leads up to a set of new treaties. That meeting is called WTSA and is happening this fall in Geneva, fall in the northern hemisphere, so, at the moment, governments around the world are asking for input or what they should say, what they think and what they should say when they are scheduling this study group, what they should do. One of the study groups they are starting up is number 20, about Internet of things, and they are changing the charter of study group 17 on security issues. Now, if any of you think that these issues should be done or not be done in ITU, please tell your regulator, Government, whatever, so they can tell the ITU. Internet has written a really good summary of all of this, I think they released yesterday or in the last couple of days, please have a look at that and if it is the case that your brain is as weird as mine so you are interested in these kind of policy issues, please read that and talk and help your governments, specifically if there are things you think ITU should not do because you'd rather go to other organisations like the RIRs or the IETF. Thank you.
SHANE KERR: Patrik, before you sit down, just so we're clear. There's a fundamental difference between ITU recommendations and for example, recommendations from another body, because there are often somehow binding, right? Like legally binding?
PATRIK FALTSTROM: They could be. What happens is that the members of the ITU, states, if they sign the recommendations and treaties, they actually end up being binders for whoever signed it. What we should remember at the last conference where the members were supposed to sign these treaties, a majority of states did sign but the European member, the European states, the US etc., did not sign that treaty. There was a big mess in Dubai which means that there is a discussion going on. But in reality, yes, you are correct.
SHANE KERR: Okay.
CHAIR: Is this really quick, Randy?
RANDY BUSH: Yes. It's been a pleasure to have a consistent bogey man in our culture, the ITU, but I'm far more worried about my code and your code than I'm about the ITU.
SHANE KERR: Fair enough. On that cherry note. Thank you everyone.
CHAIR: That's a great observation and that was really good ‑‑
AUDIENCE SPEAKER: This is Patrik again. Let me just clarify what Randy said. I completely agree and that's why I'm only controlling the water in my basement with my own code and that's scary enough. Anyways, the important thing is here, from my perspective, I don't want to have the MPLS mess again where we have to fight this kind of a specific technology in multiple venues, one venue is hard enough.
CHAIR: Thanks, Patrik. Thanks, Shane. That was lovely and thought‑provoking and fantastic. Thank you very much. Before you all run off for coffee, I'm short of caffeine too, I need coffee. Really quickly, two people will be retiring from the Programme Committee by rotation at this meeting, and Filiz and Jan. We are seeking nominations for the Programme Committee. They close at 3:30, Copenhagen time, today, and you can basically nominate yourself by e‑mailing pc [at] ripe [dot] net and tell us why you'd think you'd be a really good member of the Programme Committee. We need to keep the Programme Committee fresh because that's what puts really good content on the stage like the talks you have seen this morning. If you know of anybody that would be a great addition to the PC, convince them to stand or just e‑mail us on their behalf and drop them in it and we can follow up on that. Sorry for eating into your coffee break. We start again at 11. Thank you.