WHOIS Task Forces 1 and 2 CRISP transcription
WHOIS Task Forces 1 and 2 Teleconference
GNSO Constituency representatives:
ICANN Staff Manager: Barbara Roseman
Coordinator All participants are on a listen-only mode. After the presentation, we'll conduct a questions and answer session. Today's conference is being recorded. If you have objections, you may disconnect at this time. I'll turn the meeting over to Mr. Jeff Neuman
For those not on WhoisTask Force One or Two, we submitted to Andrew and Leslie a list of questions that the task force came up with. It was merely meant as a guideline to let them know what types of topics we were interested in so they could prepare their presentation. But my no means is it an exhaustive list of all the questions we had. I'm sure there will be other questions from other people on the call. Is Andrew Newton on the call? Or Leslie Daigle?
Coordinator No, not at the moment.
J. Neuman Is Marcos on the call?
M. Sanz :Yes.
J. Neuman: Since they're not on the call yet, are you prepared to go through the presentation until they can join?
M. Sanz: I haven't prepared very well, so wait one minute and I'm sure they'll appear. If in one minute they're not here, I'll start.
It was a long time ago that we founded this CRISP working group, more than two years by now. CRISP is an information service, and one of the main goals was, to be clear, what we wanted to design. It had to be a solution for all the problems that currently infested the nickname or Whois is very old, more than 20 years. By the time it was designed, the standards of the IETF for protocols were not as high as today. So the protocol is very weak. There are no security considerations, no there are no international considerations. This was identified as a problem … had to be addressed and this is the reason the working group was created.
J. Neuman: I think we have Andrew now, Marcos.
M. Sanz: Great.
A. Newton : Leslie is here, too. We were on but you couldn't hear us apparently.
J. Neuman: Marcos filled in the background of the group, if you'd continue on.
J. Neuman: The way the call is set up is you do the presentation and then we'll take questions because there are 25 or 30 people on the call.
At the end of the day, IRIS was selected as the protocol. There was a second based on LDEP called … but the working group felt that IRIS … candidate came down to ease of implementation. IRIS is now a proposed IETF standard, RFC's 3981, 3982 and 3983.
IRIS is a very flexible information access protocol that you can layer multiple registries on top on. The CRISP has worked on two registry types called D reg and A reg, D for domain and A for addresses. The D reg registry type handles thick and thin registries and domain registrars, and the A reg handles what's called the Number Resource Registries, RIRs. I should not that, before CRISP came around, in the space, the RIRs and the domain registries and registrars are actually on completely divergent tasks as far as Whois and how they were beginning to handle Who Is. Each set of constituencies was placing different controls and things on top of the nick name Whois protocol in order to meet certain requirements, and they were not compatible with each other. CRISP actually is in alignment to try to bring these two back into the same camp and allow one client to act as both sets of data.
There is other work going on with IRIS outside of CRISP. There is E-reg, which is essentially an ENUM registry for IRIS, which is now a working group item of the ENUM working group within the IETF.
In addition, there is new work beginning in IETF about emergency context resolutions. Essentially, that is how do you discover which emergency response center do you send emergency messages to, based on geographic location. IRIS is being set up as a proposed candidate for that problem. There is other work going on the NGN space as well. In order to have a lot of convergent network technology coming together, they need meta data access protocols for that.
Page four. What's the value of IRIS? It's decentralized by design. It doesn't mean you can't do centralization. It actually takes decentralization into its core essence. This is done because the feedback we got when we went around asking people what they wanted and in the CRISP working group itself, the registrar is really, from what we heard, waned to keep the data in their own server, they didn't want to send it out to a centralized spot.
That also helps in the management of the data because anytime you data being centralized, there are all sorts of accuracy issues that suddenly arise that you never had before. So with decentralization, we actually have navigation built in, we try to use DNS hierarchies where possible. The point of doing this is so you don't have to set up a well known server which can be a single point of failure in the system, and typically a political hot button when you do that. It's also a request many people had.
So we allowed TLDs to put NAPTR and SRV pointing to their own servers. So when you do a query, you know that if I'm looking for something in dot-net, you go look up the NAPTR SRV records in the dot-net zone, it points to the dot-net IRIS server.
The protocol also contains NC references and search continuations. These are ways for a server to point to other data in other servers. Where an NC reference is an explicit knowledge reference saying I know this entity exists in this server, versus a search continuation which says I think this data is over here or possibly over here, but continue your search there. When it comes to creating a client, those are two very important distinctions that have to be made.
The protocol also uses a thing called saffel for … authentication mechanisms, and that allows us to do one time passwords or certificates or plain password or whatever. It allows greater flexibility and how you do access. Finally, the protocol is structured to allow us to do internationalization and IDN support more easily than without. We do that using XML.
Slide five. What's the cost? First off, IRIS is an open standard, published by the IETF as we said. There is no intellectual property attached to the actual protocol itself. You don't have to pay royalties to anybody to go implement it. And there is no specific implementation necessary because it's an open standard. So anyone is free to come up with their own implementation of the protocol as they see fit for their needs. I'll point out there are some open source clients … already available.
If you wanted to implement this on your own, the protocol was specifically designed to reuse common building blocks. We use XML for doing the structure and tagging of data. XML is a very well known serialization format on the Internet. We use … and SRV resource records for finding a navigation of data, and also use things like saffel for common access control mechanisms.
As for the database, IRIS system is designed to sit on top of a current registration database. The intent isn't to have the database be changed in any way in order to accommodate the protocol itself. So there are no database changes at all; it's basically another protocol engine that sits on top of the database to get access to it.
We find that very important because the moment you start requiring changes in anyone's registration database, that means they require changes in their business process and that can get expensive really fast. So IRIS doesn't impose any matrices or tree structures or anything that requires back end data changes.
Slide six. The CRISP status so far is the CRISP working group has met all the original milestones to create the requirements document and the core and main registry protocols. Currently, the working group is working on the address registry. I expect that to go to the last call very soon. Recently, we're undertaken new work on this IRIS over UDP, which is to make it really fast, and something called V-check with is a lightweight domain availability check. That will also probably be the last called pretty soon.
Page seven. There is other work going on as well. There has previously been work on putting SRV records into the zone files of TLDs, so that the Whois client can find the Whois more easily. We're actually working on a cohabitation document which discusses how clients can use that information to find Whois servers and define IRIS servers. So essentially, one client could access both Whois data and IRIS data at the same time because there is no flag date for transitioning. It will be an eventual, gradual transition.
Page eight. The current deployments that I know about, for com and net, we actually do have a server up and running, and this year we plan on adding the UDP support once that gets finalized. In 2005, DeNIC is going to stand up the server, and well as NOMINET and RIPE NCC is looking at standing up the server as well. I want to point out that with CEUK com and net, it represents well over 60% of all registered domains.
Page nine. Navigation of servers and data, I talked of this briefly. The way IRIS determines-one of the methods-to use DNS hierarchies is we allow zone operators to put NAPTR and SRV records in their zone. At that point, IRIS using that data to find the correct server to go talk to. This avoids the concept of what's known as a well known server to the purposes of finding the data, which typically is a political hot button in some circles.
There are other methods of navigation of the data. IRIS has a pluggable architecture for doing that, so you can add other things, resolution methods. Specific to domains, there is a top-down and bottom-up resolution method where if you were looking, for example, at dot-net, we would start looking in the zone file like dot-net to see where the server is. If you didn't find it, you go to dot-net.
For navigation, the protocol also has query distribution energy references and search continuations. NC reference is saying go look here for this data where search continuation is. We'll continue your search at this server. This allows registries to point to the same data in the registrar. So com may point to the example dot-com in the registrars database. Registrars can …there is a function for the registrant to point to stand up servers. There is a pluggable architecture. You can add new navigation methods, if needed.
Page ten, tiered access. IRIS has multiple authentication mechanisms. Authentication is what allows you to do tiered access, especially strong authentication. So in tiered access, who controls what data is actually given back to the client, is done by the server. But the authentication allows the server to determine what type of data the user gets back. You can actually have multiple tiers, it's not just anonymous. You can have many tiers depending on what the policy dictates.
You can also coordinate how this is done. There's a note that you can do this in band; there are mechanisms within the IRIS protocol itself to allow servers to coordinate policy and authentication. It can be done out of band with some other mechanism or both ways with a combination.
I put an example at the bottom of the slide. Basically, you have someone accessing an IRIS server looking for Mark Costers or Costers.net and they get nothing or they get the fact that they know his name and what country he is in. But if they provide the correct authentication or credentials, they get the full contact information.
L. Daigle … that full contact information is provided in today's Whois server. Andy actually thought that of the Whoisdata, and there aren't any options in today's Whois server to do anything other than display all.
There is a thing in IRIS called relay bag, and what that allows you to do is it allows a server to relay via the client some authentic data to a reference server so access can be controlled at that level, if necessary, where the first server may not have the data, but it refers on to the next server. But it is able to authenticate the user and pass that authentication credential on to the reference server.
The server also has another extensibility control, we call them controls, in the protocol themselves. It's just another mechanism that allows the client to hand the server some extensible piece of data the server uses to act upon a request. As an example, when we implemented the IRIS server for com and net, one thing we wanted was we didn't want the speed bumping or rate limitations from the Web server to be adversely effected. The Web server was actually talking directly to the IRIS server. So when the Web server makes a request to the IRIS server, it hands it a control that says, "Here's the requesting IP address of the person coming to me," therefore the specific IP address cannot get more data via the Web or IRIS server if they wanted. They couldn't take out the system, in other words.
Page 12: Again, the IRIS server is policy neutral. The protocol actually doesn't speak the policy, and that was one of the mantras of the CRISP working group, which is about creating an access protocol that didn't require any policy to be made within the protocol itself. So all sorts of information can be given back in the protocol, or it can not be given back, or the server can actually tell the client, "I have this data but I can't give it to you. In addition, there are other privacy mechanisms in there where the server can say, "This specific data is sensitive and you are not allowed to redistribute it. Things like that.
The information within IRIS can be centralized in one central IRIS server. It's a typically easy thing to do. The service was designed for decentralization so you can have it distributed. Or it can be centrally indexed and then distributed, so you could have an index server that has an index of the data but not the actual data itself, but points to the place that does. This is all a matter of policy. The protocol itself is policy neutral. This give policy makers a lot more options in order to make policy for the situations they're determining.
Page 13. The protocol is well structured. It allows you better server performance because in many cases … sometimes the client doesn't have a lot of knowledge about what the request is, so the servers have to guess at the request in many cases, not all. What the structure does and what … queries is it allows the client to be very formal about what it's requesting. Therefore, this is not only ambiguity on the client side, but the server then doesn't have to go hunting through multiple database indexes to find the data; it knows exactly what the client is requesting.
Provides structure, normalized data, this does two things. It enables localization of the internationalized protocol elements, and I'll show you an example in a minute. It also provides for richer client presentation, so the client no longer has to be very similar to only text out, but we have a graphable user interface that we've developed here. But if it wanted to also be in Brail, whatever the end user needs … we can do that.
The relationship to the query is clearly noted. Because all the data is normalized, each entity is well tagged in what the entity is. The relationship is clearly noted, and the relationship to whether it's a direct answer to the query or some sort of ancillary data is also noted in the protocol.
When you combine the well structured data of the queries and responses with strong authentication, this enables you to have very good audit trails, if the implementer wants to do that. The other side effect is, because the data is highly normalized and the queries are well known, these audit trails can be meaningful to third parties, if necessary.
Page 14. We talk about structure and internationalization, the way this works is what the content of the data is under the control of the server. So the server dictates whether to hand back the address or not, or the postal code of the user or whatever the information is. The actual presentation of the data, though, can be determined by the client. This allows for localization of tags within the protocol itself. So we have here a picture of our graphical user clients. You'll notice a lot of the menus and other things are not in English but French. This allows a French user to more easily look at this data and be able to figure out what's going on. When they click on something like the verisign-atlas.net, it opens another window. Instead of saying domain name it says the French equivalent.
You'll see that on slide 15. Here's a close look at the data. On one side we have English, one side French. This more easily gives the end user access to the information, especially if they are a non-English speaker.
A. Newton Yes.
Then we have the common registry, which is essentially the IRIS core layer which describes how you talk about entities and searches, then we have the registry specific stuff. So we have the main and address registries. And you can have data from one registry point to data in another registry type, but this stops you from complaining the different types of data.
Also, this allows reuse of these common components. All of these things are common components on the Internet. They're easy to find, so people can implement this protocol really easily and allows us to switch them out when necessary for newer, faster things or different use cases that may come up.
Some of these common components are things like XML, NAPTR and SRV records and SASL 4 which is an authentication framework for doing different types of authentication.
Slide 17. In conclusion, the IRIS … are standardized, but the basis of what we set out to do is done. We're currently working on improvements in other areas, like the address registries. We're working on the UDP and lightweight availability check. There are … for other registries like e-mail.
The benefits are that it's decentralized, it has navigation as part of that decentralized and distributed manner so even though it's decentralized, you can easily find the data. Better policy support, via multiple authentication because one of the things that plagues the current court 43 is the fact that the only method people use for authentication right now is IP addresses, and that works for certain things but it doesn't take you that far.
And of course structure and internationalization. The structure gives us better performance and easier to understand, less ambiguous queries and answers, also allows for internationalization and better IDN support. Of course, it's extensible because there will be future needs down the road that we haven't anticipated but we want to be able to take advantage of extending the protocol for those needs.
The protocol itself is low cost. It's not intended to replace or change anyone's database. It's just intended to sit on top of the current database, much like the port 43 engines are today. It has many authorization management features built into it so that doing distribution of authorization keys or management is much easier and far less burdensome than some would think. I'd like to point out that there are open sourced clients and servers already available of this protocol.
Last slide, 18. For follow up, here are Marcos, Leslie and my e-mail addresses. If you have any questions or concerns that haven't been answered, you can e-mail any of us and we'll be happy to answer that. Or you can go to the Chris Burton group and send e-mail to them as well and get an answer there. Jeff?
J. Neuman: Thanks a lot.
Coordinator We'll now begin the question and answer session. The first question is from Marilyn Cade.
Most or many of the people interested in this topic, particularly from the task force perspective, are non-technical. I think you may have questions coming that seem very simple to those of you that are technical, but challenge those of us who are not. I just have a couple of clarifications that I didn't understand.
On the one hand, we say that IRIS is waiting adoption, yet we see there are existing deployments. Could you explain where - and I've looked at the list of registries that are adopting - but I don't understand yet what an option by the rest of the registries would mean, or what it would mean if not all registries adopted.
That was a complicated way of saying that yes, servers are being stood up for individual registries much in the same way one can stand up any other kind of service for the data. But there is no claim that these are the replacement for Whois servers at this time because Whois still formally defined as a way to provide registrant information.
Part two of your question was what happens when only some people are playing. To that point, the issue is, in the same way that the current Whois service is somewhat … really an island in deployment in that every registry has slightly different implementations of how they present Whois information. There is the danger that, with piecemeal deployment of IRIS servers, there is still some level of islands of data available, but not the whole space. The only solution to that is to develop what the deployment policy is for registries and registrars.
M. Cade The task force does not have consensus option on what the data should be displayed or not. Are the present prototype deployments restricting access to data? Or are they examining other ways of presenting data?
A. Newton I can speak to the one at Verisan Labs. We do restrict the data for data mining purposes. What would take someone a year to actually go through all the data, a certain query but I can't remember, it's a pretty high number to hit. In fact, on our Whoisservers, we don't actually see those type of query rates.
Once you stand up the servers themselves, that allows the clients to start playing around with how they want to see the data to begin with, and that's the important thing. So the intent of our current server is to allow clients to access the data and to be able to see what they need to with that data in a highly structured manner.
M. Cade Who would you define as the client in this case?
A. Newton We have two methods of access. We have the IRIS court access, which is what's defined in the national protocol spec. So someone can go download an IRIS client, the client software, and they can access it. There are two pieces of client software available, one is written in Pearl and one in JAVA. They can go access this data.
The other way is we have a Web interface. Our Web site talks to the IRIS server, that way they don't have to actually go download a client. This allows them to get to the data via Web interface. If you go to the Web site, which is IRIS.verisanlabs.com, we play around with how that data should look to users, etc. Because it's not unstructured, textured data, we can do that.
M. Cade I'm sorry, my question wasn't clear. When I use the term client, as a consultant, I mean the people I'm working for. I should have asked what kind of users are accessing the data, by category. Law firms? Registrars? Law enforcement? Business organizations? Do you know?
A. Newton We don't know because we haven't placed any strong authentication on the server at present. One reason we haven't is, first off, we don't want to step on anyone's toes, especially the ICANN Whois task force. So we're waiting to see what you come up with in that regard. In addition, we're only a thin registry for com and net at the moment. We're not sure how much information we have is all that interesting to some users. We are looking at what we can do, what makes sense, as far as restriction of the data and data access to certain clients. But we don't know, at present, who they are.
L. Daigle So it's not the case that we've been directly targeting … customer segments and saying, "Here's a new service you should go and use." It's more a case that you put the server up because, as you know, this whole process is … Having defined the protocol, it's now important to actually put it up where people can actually work with it, … can go and play with it and get a sense of "Aha, so this is what's feasible technically." Get a sense of what the range of possibilities are to help inform the range …
M. Cade My next question has to do with the statement you made that authorization is easier and less burdensome than some people think. The two areas that I saw suggested as forms of authentication did not seem to take into account the developing country issue where certificate authorities are, and the use of credit cards, it's very different than it is in the developed world. Are we assuming that authentication will be possible, but we're not assuming it will be mandatory unless there is a policy recommendation. Is that right?
Coordinator The next question comes from Maureen Cubberley.
M. Cubberley Marilyn, thanks for introducing the questions that way. I have the same view, that some technical things may be complex, but they seem to flow into some fundamental questions. I'd like to ask a question about slides ten and eleven where we're talking about the tiered access and authentication distribution. When we talk about the multi type authentication methods, does IRIS draw a line between the categories of viewable information, the types of viewable information that exists? How customizable is that?
M. Cubberely: So it's infinitely customizable. Right?
M. Sanz: Right.
Coordinator No further questions.
J. Neuman: I think you said something about the fact that response times shouldn't be effected by layering the IRIS protocol on top.
A. Newton: No, it shouldn't be.
J. Neuman: That's true whether it's a thick or thin registry?
A. Newton : It's better for a thick registry. It helps in both situation, but a thin registry doesn't have as many indices to look across as a thick registry. Therefore, because the queries are well structured and a lookup is defined as hitting one index, so in a thick registry you'd probably get better performance.
You can look at a DNS server, which actually handles much more load for CPUs than a Web server. The reason is the vast majority of their traffic is … Those are the things we're still refining. We're even talking about a more refined TCP transport, but that's work we're just now looking into. So there would be no reason why you couldn't meet your current SLAs you have now. In the future, this protocol will be even faster.
Coordinator There's a question from Marilyn Cade.
M. Cade I have a number of questions. It may spark questions from others. A couple things come to mind. We opened the conversation by saying that the existing nick names Whois protocol was developed many years ago. We all know we've learned a lot, and that in fact, the Internet has changed a great deal since then. So if IRIS were implemented today, if there were a consensus policy, for instance, that supported the implementation of IRIS-and I'm speaking hypothetically-if it were implemented today with full display of all data that is gathered, what exists in IRIS that will help with the accuracy problem, which is a serious problem in today's Whois.
Secondly, what exists or could exist with the implementation of IRIS that would allow the discrimination between category of registrant? For instance, in a dot-post, which is announced by ICANN staff as being under negotiation or their authority and post, as I recall from their application provides something equivalent to a post office box where they gather accurate data such that accurate data is not displayed. But in another TLD, there might be a different kind of category of use, like a user who says I'm an individual and I've authenticated, does IRIS allow us to discriminate in the display of data between different categories of users?
My next question has to do with trying to understand the cost and burden on registries and registrars of moving in this direction, and whether there is a transitional period or cost anytime, as I know from having run a business, anytime you adopt a new software or new means of doing something, you have a number of both hard and soft costs, including training of your people. What are the cost areas we should be thinking about if we were to move forward in this direction?
The big answer to that is that people will be much more willing to make sure their data is accurate, or be much less apprehensive about providing accurate data if they didn't think it would be shown to everyone on the planet.
L. Daigle: I think that's right, and it does somewhat lead into the second question about distinction categories of users. The point with allowing signing of data, etc, from a display perspective, you can make the distinction between types of data and types of users. It becomes a question of deployment realities, whether that is useful or not.
M. Cade: Andy, I've noted your comment, and Leslie's support, for your personal view that people will be more willing to provide accurate data, or less apprehensive, if not shown to everyone on this planet. I think that the task force doesn't have a consensus view on that.
L. Daigle: What we're saying is that is the limit and extent to which IRIS supports a solution in the accuracy problem. We recognize there is a much larger issue about detecting and ensuring accuracy of data, and that's beyond the scope of IRIS.
J. Neuman: Marilyn, can we see if there is anyone else in the queue?
M. Cade: We can, but I just want them to pursue, since the task force doesn't have the consistency on that particular force, and I understand that IRIS isn't the answer to accuracy. Are there other steps that could be taken or other activities underway that could effect the accuracy problem, leaving aside the display issue, but could address the accuracy problem?
L. Daigle: The answer is that all technology can do is help to accurately convey the level of confidence in the accuracy of the data. You can convey that the registry/registrar believes this data to be accurate, that's all you can do.
M. Cade Can you go on to address my last question which was more about what you … different areas of cost.
L. Daigle: It is the case also that an individual user can say I'm willing to have my address shown publicly, or no I'm not.
M. Cade: Good answer, both were helpful.
J. Neuman: Operator, anyone else in the queue?
Coordinator Next question from Mickey Mouse.
J. Neuman: I suppose that was probably not a serious one.
Coordinator It was someone who refused to give their name, sir. In that case, we'll take the next question from Ryan Lehning.
S. Metalitz: This is Steve Metalitz with Ryan. A follow up question to one or two of Marilyn's questions. When you say it's not a provisioning protocol, it doesn't effect then what goes into the Whoisdatabase. It's just a questions of what comes out in response to a query. Is that right?
A. Newton: Correct.
S. Metalitz: In that case, the question about different registrants specifying different data to be made available, that does require categorizing registrants. I'm not clear on how IRIS helps with that. You'd have to distinguish between a registrant who says all my data can be available, as opposed to a registrant who says let's assume the policy allowed none of my data available.
L. Daigle: That's true that that distinction has to be made within the database, but the issue is what will happen is, when an IRIS query comes in and it's made to the database, the data either is provided to as part of the response, or it is not. Inbound on the query is also an indication as to whether or not this is a general anonymous question, or whether there was any level of authentication and credentials available.
S. Metalitz: I understand it can distinguish between Whois requestors. In terms of distinguishing among domain name registrants, just to take the domain name side of this, it doesn't do that. There has to be a distinction in there that it will …
L. Daigle: The distinction must be captured in the data held by the registry/registrar.
A. Newton: EPP does that, has those type of capabilities already in it. So registrar's can tell the registry the desire of the registrant.
J. Neuman: It's got that in the protocol that the EPP want to and the registries are in the process of implementing it, but most registries at this point on the policy side, don't activate those privacy flags yet.
A. Newton: That's a good point that again, here is something that the protocol provides that policy has not stated as needed or even has maybe overridden in some cases. So the registrants could be saying I don't want any of my data showing, it's just not a policy decision that's allowed to be made. Essentially, the example that a registrant has said, "I don't want any of my data to be shown to anybody ever except for me," and a law enforcement person comes into an IRIS server and authenticates, as law enforcement, it doesn't matter what the registrant asks for. The policy is that the law enforcement gets to see the data. Or it could be that they don't. It's all a matter of policy.
S. Metalitz: To the extend that there is a differentiation, given holding the requestor constant to the extent that there is a differentiation in what's returned from the query based on characteristics for desires of a registrant, IRIS does not have that capability. That would have to be an EPP capability that's used, or else that protocol would have to be adapted to provide that capability, right?
L. Daigle: We're having a problem with terminology. I think the right answer to convey what you want to hear is yes, but I'm not comfortable saying that IRIS doesn't have the capability in distinguishing between users because I believe it does.
S. Metalitz: It does, if that distinction is already there in the database.
L. Daigle: Right. There would be no way for IRIS to fabricate or interpolate what the registrant's desires were.
Coordinator The next question comes from Paul Stahura.
P. Stahura: I'm worried about people who have the correct authentication getting the data and then distributing it to people who don't have the correct authentication. It's a policy thing. I'm wondering if IRIS can help. Let's say we had a policy that said people with dis-authentication could only keep the data for 24 hours … Can IRIS help us there?
M. Sanz: I'm sorry to say I don't think IRIS can help because once the data has been delivered, there is no real way to track what is happening with this data and for how long the data will be kept at the other side.
A. Newton: That requires proprietary software to accomplish that. It's really hard to watermark data in such a way that people don't notice, but this is all textural data which is much harder to watermark. On top of that, if you want to do something similar to DRM like what ICANN does, that is proprietary software and that's exactly how they accomplish that task. You can't really say we're going to have an open standard, and then require everyone to get a certain piece of software.
P. Stahura: I understand that if somebody is not following the protocol, they could ignore. Let's say there was … to live on this piece of data, they could ignore that but then they would be breaking the protocol …
L. Daigle The issue is that they're breaking the policy. What you need is mechanisms for detecting if there are continuous … doing that. For instance, if there is a registry client that has signed an agreement not to do that and they're repeatedly doing it, there should be the ability to turn around and revoke their rights.
To follow on Andy's ICANN's example, I don't know how many people are familiar with that software and service. Apart from the fact that it's proprietary software, two important aspects need to be understood about how it achieves the control of the digital material. It not only requires their specific software, it also means their software has to anticipate what our appropriate and likely use case is. Those are the ones supported for all users across the globe. That's not the sort of thing that we've seen as being applicable for Whois. Marilyn mentioned some ways in which different regions of the world are different in terms of their needs and abilities.
I'd also observe in the case of ICANN, since this is music which is very popular, it wasn't that long before they started to use software which ripped … controls out of the data. Then you're right back to square one. If it was of sufficient interest to people to do that here, exactly the same thing would happen.
A. Newton: If you want a short answer, yes, you can put a TTL on the data if you really wanted to. I don't know if that's what you were getting at, but you can put a time to live on there and if someone violates it, they're a bad actor.
L. Daigle: You can provide mechanisms; you can't provide enforcement.
P. Stahura I'm just wondering, if it was in the protocol at all. I understand that if somebody breaks a policy of ignoring the TTL or if the policy was in contract with the people getting the information to say you can't store it or pass it on, I see that. They can't help us because we're not implementing some kind of visitor right management thing, and we would need a proprietary client, … on the front end, to enforce all that. I see that. But if it had a TTL, people who got the "standard reference clients" would at least have that. Then they'd have to build their own client if they want to violate that TTL.
A. Newton: You can put in those types of policies, if you want. If you go to the com net IRIS server, you can see where we have stuck in that you're not allowed to use the data for spamming. It's not going to stop anyone from using it for that, but we have that in there where we put the notice. You can't use it for spamming, so we've given them the notice. If they do it, they're in violation.
P. Stahura: I see that. But if you had a TTL, then your Web based clients would not be able to store the information if you complied with the spec. Right?
A. Newton: If you're asking specifically if there is TTL in the actual current dereg spec, no there is not. Could one be added very easily? Very, very easily. I can do it in about two minutes. But I want to point out that this is textual data. Someone could easily cut and paste it and it wouldn't even require implementation of the client to do it. Then off they go.
M. Sanz: This is why I said at the beginning that it's not feasible.
Coordinator We're showing no more questions.
J. Neuman: Okay, a follow up on the audit trail comment. You talked in the presentation. Would this make it possible for the server to collect information about the requestor of the information and distribute that information out to the registrant?
A. Newton: Yes. I want to be explicit about this. Audit trail is actually not a function on the protocol, it's a function of the implementation of the protocol. You do have the benefit of the fact that, because the data is highly normalized, it's well understood what was accessed and by whom. When you use strong authentication, you can say here is the actual serial number of the certificate or the user ID of the person who came and got this data. And they got this specific piece of data, they didn't just query. They entered this and got a bunch of data back. So the audit trail would then be meaningful to a registrant or any third party.
L. Daigle And if you were serving up data with only on access, then you could provide information in terms of the number of accesses to the data.
M. Sanz: And since the queries are much more structured, you can really categorize, and have the refined granulated policies in the sense of … person is only allowed to make ten queries regarding person data per minute, but he's allowed to make 100 queries regarding domain data per minute. Because you really know what the person is querying so you have very fine granulation of it.
A. Newton Also, there is a particular type of … access mechanism called one time passwords. It's basically a cryptographic password solution where a password is only good a certain number of times. So you issue the password, it can only be used a certain number of times before cryptographically it can no longer be used again. So not even an accountant administrator could up the number or whatever. Once the password it is handed out, it can only be used a finite number of times for access.
M. Sanz: In the case of certificates, this is much more extreme. Even if you were able to keep track of old data that the client has sent to the server, there is no way for the server to impersonate the client.
J. Neuman: Okay.
Coordinator We have a follow up question from Marilyn Cade.
M. Cade: Jeff, have you gone through the questions that were submitted? Because you just asked one I know had not been asked. Is that what you were going to do next?
J. Neuman: That's what I'm going through now.
M. Cade: Okay, because I wanted to ask a follow up question in line to what we were just talking about and consistent with the questions that we're going through. On the ability to identify who has submitted the request to gather data on the type of request that's being asked, whether it's just availability of domain name or actually information about contact, etc, is there any mechanism in IRIS that would, when incorrect data is identified, flag that? Of if fields are left blank, it's possible to print out a report showing that a certain percent of the data that is being input by the registrant is being left blank, those kinds of reporting statistics?
M. Sanz: How do you mean block that, not being deliberate to the client?
M. Cade: You're right because this is an overlay to the underlying database.
M. Sanz: It's just about delivering, not entering the data. Obviously, the implementation you have, you can keep track whenever you are answering and query whether you found not well formed data in your database. That's possible. But I think that's too late. You are just realizing you have something inaccurate or not well formed, you should have checked that before entering it into the database. Does this answer your question?
M. Cade: I don't think so, but it was helpful in any case. I'm learning more. Let me refine my question. One thing that the registrars are responsible for is, upon being notified that there is inaccurate data, they need to take steps to notify the registrant of the registrant's responsibility to correct that data.
M. Sanz: I see. It's much easier with this protocol to identify inaccurate data in the sense that the protocol is much more structured than Whois was so, now, you have very well defined fields. Here you have to have a telephone number and it has to have to have this format otherwise it's wrong. The protocol itself provides this information, XML information, so it's really easy for a registrar to check their results coming against a pre-defined format and get aware whether the data is fulfilling the rules defined in the schema.
M. Cade: That's what I was looking for was the reporting …
Coordinator No more questions at this time.
Jeff Neuman: One other question I had was on the Whois servers. Is there is an IRIS server run by the registry, it would be that individual server that defines what the output is, correct?
A. Newton: Ultimately, yes. So let's say that ICANN says there is a policy that certain classes of users see certain types of data, but the decision is ultimately the server operator can make who sends that to the client. If they didn't do what ICANN is asking them to do, then they are in violation of ICANN policy. But you're right. However, because the data is well structured, it's easy to figure out when they're not conforming with policy.
J. Neuman: One more question on data structure. Although you initially talked about the cost being pretty low, are there costs imposed on the operators for existing names to put all that data in the format that's specified by IRIS?
A. Newton No, the intent is that you don't have to restructure your database or do anything to it. You're just accessing the database in the same way that the current Whoisservers are doing it. We actually spent some time on this in the CRISP working group, wanting to make sure the protocol itself didn't require any type of reorganization of indexes or anything like that. We felt if that occurred, suddenly this is too burdensome to implement.
L. Daigle: I would refine that slightly and say the cost essentially is in a software that does the translation, the mapping between your existing software and providing data in the IRIS protocol. It doesn't have to be a flat out conversion of data and databases or the backend software tools. The other thing I would say is, to the extend that there are changes needed in that backend database, it's going to be required to support policy changes, is not a software question.
J. Neuman: So it will be the translator that registry, let's just take a phone number, for example, some registries start the country code with a plus sign versus just having the number out there without the plus sign, some have it separated by periods and some might use hyphens. You're just talking there would be a conversion software that would convert it - it wouldn't make any changes in database - but convert it so when it's displayed.
L. Daigle Right. Whereas, if there are policy changes about only display the first five telephone numbers of telephone numbers for people whose last name starts with D, that's something that would be needed whether you're using IRIS access or Whois
J. Neuman Okay. That answers all the questions I have in the form we gave you.
Coordinator The next question comes from Steve Metalitz.
S. Metalitz: It's a follow up on the same point. This goes to the question of the accuracy of the data in the database. I thought I heard the statement that the accuracy would be improved because the fields are well defined. Let me take an example. Let's say the phone number listed is 1-2-3, which we know is inaccurate. Can anyone explain how IRIS would help you determine that this was inaccurate?
L. Daigle: On the outside, in terms of a client looking in as opposed to the provisioner, the issue is that you understand that this is asserted to be the phone number as opposed to wondering whether this was a fragment of, for instance, a zip code.
S. Metalitz: So as the Whoisrequestor, I would have a higher degree of confidence than today that this is really the phone number listed in the database.
L. Daigle: You understand the assertion that this is the phone number.
M. Sanz: It's not that you have a higher confidence that this is the phone number listed in the database. You can have the absolute confidence that this is the data listed as delivered by the server. The point is, this is not the definition of accuracy. It depends on the definition of accuracy that you're using. Do you have accuracy if you interpret this is the data delivered by the server. But 1-2-3 is obviously not a valid phone number.
Ryan Lehning: This is Ryan. Can you have confidence that that was what the registrant entered as the number?
M. Sanz: You can only have the confidence that that is the data that the registry is delivering you. If you tracked the registry-
A. Newton: That goes back to a disconnect in the terminology. When I hear the question about accuracy, I think of the provisioning side. But the questions you've been asking is accuracy in how the client understands the structure of the data. But as to what's in the database, that's more of a provisioning instrument that must be undertaken. I think ultimately, there are technologies to help and to detect inaccuracies, but there are not technologies to make it accurate.
S. Metalitz: My other question was the audit trail question. It was stated that this is not a function of the protocol but of implementation. So our questions about if there is a record of the query, where is it stored and who has access to it. Those are questions that would be decided by the implementation of the protocol. The implementation could say you only have access to it if you present a court order, or it could say the registrar has access to this for marketing purposes. It could say anything.
L. Daigle Right.
Coordinator The next question is from Maureen Cubberley.
M. Cubberley: On an entirely different topic, one of the earlier questions was what does this mean if not all registries adopt it. You do have an item in your presentation about port 43 cohabitation. What are the main issues you're working on in order to enable both IRIS and those registries to chose to stay with what they've got with port 43? What are the issues around compatibility and insuring that it is a closed running system and the access protocol you described is complimentary, if not supportive, to what's already happening with those registries that stick with port 43.
A. Newton: Port 43 is not all that complicated of a protocol. You could write a Whoisclient in about five or ten minutes. Putting that functionality into an IRIS client is really not that difficult. The thing is that the end user sitting at the IRIS client doesn't get the IRIS benefit of accessing Whoisdata over at port 43. When we talk about cohabitation, it's a way of allowing the navigation elements within IRIS to actually be back quartered to port 43, so you can find the correct Whoisserver. You still don't have query distribution and some of the other nicer things built on the IRIS, but at least you can do that. You would have one piece of software that could talk to the Whoisservers, and one to talk to the IRIS servers. You don't get all the IRIS functionality out of port 43, but it means you don't have to do through different ways of getting that data.
M. Cubberley: So one query will go to where ever. There's not a hierarchy of ranking how it's accessed?
L. Daigle: There is the ability to express preference, naturally that would be towards IRIS servers because the truth is it's IRIS clients who will be making use of this. Whois clients don't have notion of this idea of co-habitation or anything else. But it is, at least, a mechanism for bringing the Whois servers into the fold to the navigation side of what IRIS provides. You get to that point, and there is a hand painted arrow pointed down a pot-holed bouldered dirt road. Go down there at your peril, but there it is.
Coordinator Next question is from Marilyn Cade.
M. Cade: We could have skipped past something you said earlier that is actually very important to ISPs and big infrastructure providers. That is and to business users themselves. We rely on not only the DNS Whois but also the Whois maintained by the RIRs. You mentioned one thing IRIS does is bring together a form of consistency across those two "Whoises". Could you elaborate on what the stat is with the RIRs and what's going on to the extent you know it?
A. Newton: All four RIRs are current participants to the CRISP working group. One of the co-chairs of the CRISP working group is George Michelson, Whoisthe CTO of APNIC. The RIPE NCC is doing a lot of work in this area in getting Gunders and Shane Curd have actually are now owners of the draft that talks about how the address registries are to be structured over IRIS. In fact, we believe that we'll go to last call pretty soon in the CRISP working group. They worked pretty strenuously to get this right. We have engaged with ARIN and LACNIC and … as well. ARIN is very involved as well.
A lot of what was going on there is coordination between the RIRs and the different data models. Between the four, RIRs there are three different types of data models three differnt types of registries. A lot of that was making sure we got it right, and I believe we do now. We started it a year ago, it seems, and so ARIN and APNIC and RIPR NIC are all talking, and they represent each other within the CRISP working group. We're not going ahead with what we're calling A-reg.
M. Cade: I'm sorry, ARIN? APNIC
A. Newton: … LACNIC which is Latin America and RIPE NCC.
M. Cade: The provisional AFRINIC is participating or not yet?
L. Daigle: We haven't seen any evidence of them, but my suspicion is … because they've got more primary things on their minds.
A. Newton: A lot of how things work in the RIR space, I don't want to speak for them but I'll say what I've observed, the RIPE NCC has software they developed, and it's reused by the other RIRs. What happens is once the RIPE NCC has done it, the others will take it on. This is extremely true with the Whois servers, run by APNICare the ones that write distribute. This is true of what the call the NIRs, as well.
M. Sanz: I wanted to confirm that is the behavior I observe, as well.
Coordinator Next question from Chuck Gomes.
C. Gomes: Several times, it's been stated that the provision or data is the registry. I'm assuming that, because of the distributed nature of IRIS, that the provisioner of the data could actually be a registrar. Then you don't have the duplication of both registry and registrar being authoritative for the data, while at the same time the registry would be the provider of the IRIS Whois service. Is that correct?
A. Newton: I don't understand the question, Chuck.
C. Gomes: Several people said the data is coming from registry in every case. That could just as well be a registrar instead. Is that correct?
A. Newton: Yes. When I say registry, I'm actually meaning a generic versus a thick/thin domain registry or registrar. A registrar is a type of registry. I'm sorry if I was inaccurate.
Coordinator Next question from Paul Stahura
P. Stahura In your slide, you said registrars may point registrants, and registries may point registrars. I understand that. What if some of the information is contained in the registry for a particular query, and some of it as at the registrar? I assume there is a provision for replying with some information from the registry, some from the registrar?
A. Newton: It really is all in how the client wishes to present that data to the end user. The important part of the server side is the server knows how to properly say here is the domain information I have, and by the way, I know registrar X has this domain information as well. You may go access it over there. How the client wishes to present that is really a presentation style of the client. I've got certain theories on how best a client would do that which I've tried to implement with our GUI work, I know others have talked of other ways of doing it. But the client software can access the data in both places. The MC reference is well understood by the client so he knows the registry and the registrar have some data at the same time, and … go get it in both places.
M. Sanz: The protocol supports that you can really combine data from different repositories.
P. Stahura: What if data is different? For example, if the registry says this is the registrant name, and by the way, you could get this other information at the registrar. Then the registrar. Then the registrar says this is the name and it's different. Then the client has to reconcile that somehow. Which one does it use?
A. Newton That's a matter of how it wishes to present the data. In the client software I have, I have what's called a tree widget or tree control which is used for doing the representation. You can see that here is the date the registry has. Underneath that, you see what the registrar has. So the user can bring up both pieces of data, put them side by side and see they're different. Specifically, the client could actually be programmed to flag that and start blinking in big red letters that they differ or something. But that's a function of how the client sees the data.
L. Daigle The important thing to keep in mind is IRIS is making sure it keeps track not only of data for this given record but also … key element of what it is aware of.
P. StahuraMy second question is related to query limit. Someone said based on the certificate or authentication of the user, that user not only might not be able to see certain information but might be limited to a certain number of queries per day. Is that right?
A. Newton Yes. It's really a matter of policy.
P. So my question is, for example, I could be wrong, but can a user type in firstname.lastname@example.org as an e-mail address and query the system to say return domain name whois information for any domain name with that e-mail address in one query?
A. Newton: I believe so. You're asking if the user can submit query, saying I want all the main names with a certain e-mail address and they'll get all the domain names? Or are you asking if the server can limit how many domains they get back?
P. Stahura: If there was no limit, they'd get all the Whoisinformation for all the domain names at hotmail, let's say.
A. Newton: Not necessarily. The server can limit the number of answers it gives back. So even though the answer could be 1,000 domains, the policy can be that you can only see 100 of them.
Coordinator No further questions at this time.
J. Neuman: I'll take this opportunity to thank our presenters, Andrew Newton, Leslie Daigle and Marcos. Thanks again for your presentation. They've been sent to the task force list, which is a publicly accessible list. If there is anyone who can't access it, feel free to send this gnso.secretariat an e-mail to get a copy of the presentation. I'm sure we'll have a number of follow up questions for you guys, possibly a follow up call in the future. Thanks a lot.
L. Daigle: I'd like to say thank you, too. I appreciate the opportunity to speak about this. I'd like to end on the point that we're only trying to build technology that will suit the needs that you're defining. To the extent you can keep us aware of the need and progress in your deliberations, it will help our process in meeting our goal. So thanks a lot.
M. Sanz: It was my pleasure, as well, to have somebody who wants to hear about this.
J. Neuman: Thanks for joining the call.
The next task force call