[registrars] Draft Registrar Submission to TF1
I drafted this submission to whois TF1 on Restricting Access (whois data mining). I attempted to gather the input from registrars in Rome, on the list, and in private emails and calls. I tried to incorporate all registrar viewpoints. I did not weigh other constituency's viewpoints highly. If you would like to make modifications, please let me know, or just go ahead and make changes (with changes turned on) and send them to me or the list. Below follows a plain-text version (see attached for word doc). Three other TF1 constituency statements are also below. Paul RC Statement Vers1 On Whois TF1: Restricting Access/Data Mining The registrar's policy recommendation for the Restricting Access/Data Mining whois task force (TF1) has a great dependency on the results of the "Data Collected and Displayed" whois task force (TF2). If for example, the TF2 determines that the data to be displayed, especially via port-43, is limited to non-sensitive information ("non-sensitive information" defined as the domain itself, name servers, organization-names, and the registrar-of-record) and that information is not personally identifiable information, then the information to be mined will be of less value to miners and hence, mining will be reduced. On the other hand, if the TF2 determines that sensitive information ("sensitive information" defined as, but not limited to, person-names, street addresses, phone numbers, and email addresses) is to be displayed, then there will be a great incentive to mine the data because it will be more valuable. There is also a dependency on TF3, because if the data is 100% accurate, and at the same time, mandated to be displayed, then it becomes even more valuable, which further increases the motivation for mining. The whois data is the registrant's data. It should remain in the control of the data subject as much as possible. As the whois data moves away from the registrants to the registrars and further, to fat registries, and to even more distant 4th and 5th parties, it becomes less and less in the control of the registrants. The registrars should not be obligated to provide whois data to any party that can not guarantee that the data will be treated in a manner consistent with the policies and legislation under which it was collected. Therefore, any data collected from registrants must remain as close as possible to the registrants, at the registrar. As the whois information is passed to these other entities, more access policy-control problems are created (because there are geometrically more locations at which to mine the data). Because the registrars will always be closer to the registrants, and in between the registry and the registrant, the utility of a thick registry model should be evaluated. If TF2 determines that sensitive information must be displayed, the registrars support a policy whereby registrars may: 1) Shut off port-43 access to the public; if not completely remove it for all. a. If not completely removed, i. Who is the "the public" and who is not would need to be defined ii. Registrars must be granted access to port-43 whois, in standardized format, but only for the purposes of performing transfers and only for so long as all gTLD registries are not EPP (thick or thin) or until another inter-registrar transfer mechanism replaces it. iii. Port-43 query rate limiting must be allowed. iv. The identities of the non-public requestors must be known to the registrars and may be recorded by the registrars so that it can be communicated to the registrants. v. The requestor must have a defined, valid purpose for each request and that purpose must be known to the registrars and may be recorded by the registrars so that it can be communicated to the registrants. Some registrars believe a valid purpose exists currently and some do not. vi. The requestor cannot act as a proxy 2) Display the whois information on a publicly accessible web site, but only in a manner such that the information cannot be easily mined, and consistent with the policies and governmental laws under which it was collected. It is the registrars' real-world experience that CAPTCHA systems (systems that perform checks for humans, such as requesting a person to type in number to access a single whois record) and other systems (such as tracking the number of queries from a particular IP address), though imperfect, do work to greatly reduce automated data mining of the whois via the web. Registrars must continue to be allowed to use such systems. 3) Continue to provide "identity protection" products to registrants. Because the result is the same (obtaining the totality, or a large portion, of the whois information), the registrars assert that the following are identical: 1) Mining of registrar's port-43 output 2) Mining of fat registry's port-43 output 3) Mining a 3rd party's port-43 that proxies access to any registrar's or registry's port-43 output 4) Mining the registrar's web-based display of whois information 5) Mining the fat registries web-based display of whois information 6) Bulk access Therefore, if the data elements displayed/disclosed is the same, whatever access policies and controls are put in place for one must be in place for the others. For example, if the identity of the requestor (and purpose, lets say) must be known for bulk access, then it also must be known for mining (high query rate) of port-43. Summary of Positions I. IPC A. IPC supports, in principle, the use of query volume limitations on Port 43 access in order to discourage data mining. B. Being supportive of the debate, the IPC submits that any changes in practice or regulation have to be designed in a manner that does not inadvertently have detrimental effects on the legitimate use of Whois. C. Specifics: a. Any provision should maintain and ensure availability of unhampered access to Port 43 for legitimate applications that require high volume access to domain name Whois for use in creating value-added products and services that are of great value to the intellectual property community and to the business community in general. b. Adequate provision must be made for intermediaries which aggregate low-volume requests from end-users into a relatively high volume of queries through Port 43. c. A solution must identify realistic volume break-points between low-volume queries via Port 43 that should remain unrestricted, and a very high volume of queries that could, in principle, require an efficient and workable form of disclosure to registrars (or registries in the thick registry model) of the uses to which query results would be put. d. The solution should also preserve the unrestricted availability of Whois queries through a web-based interface, and the status of Port 43 as a service available free of charge. e. The solution must be accompanied by proactive enforcement of the obligation to make bulk access available. II. ALAC A Two-tiered system. * Tier 1: Public Access. Users who access a future WHOIS-like system anonymously get access to non-sensitive information concerning a domain name registration, to be defined in detail by task force 2. * Tier 2: Authenticated access. Users who want to access a more complete data set (to be defined in detail by task force 2) need to reliably identify themselves, and indicate the purpose for which they want to access the data. The identity of the data user and their purpose is recorded by registrars and registries, and made available to registrants when requested. This information could be withheld for a certain amount of time if the data user is (1) a law enforcement authority that is (2) accessing the data for law enforcement purposes. B. Implementation: No specific implementation recommended; example: SSL client certificates. [Prefer IRIS or other dedicated protocol over web forms.] C. Rationale: * Find out purpose of use of Whois data. Registrars would have to verify purpose, but can't. Resort to heuristics. * The best heurisitc we know of is to hold data users accountable for their activities, and to put enforcement of purpose limitations into the hands of registrants. This can be achieved by reliably identifying data uses and putting their identity, contact information, and purpose indication in the hands of registrants. * At the same time, a tiered system -- if implemented reasonably -- could preserve the ability of data users to automatically access WHOIS data in reasonable quantities. Registrars, on the other hand, would be enabled to limit the amount of data any particular party can access in a given interval of time. B. Discussion of other proposals * CAPTCHA: There have been suggestions that "automated access" could be used as a heuristic to determine illegitimate access. In this scheme, automated access is blocked by attempting to require human attention with all queries. One set of implementations of these kinds of tests is known as CAPTCHA. o CAPTCHA blocks legitimate automated access o Easy to circumvent because of design problems (See http://boingboing.net/2004_01_01_archive.html#107525288693964966 and http://www.cs.berkeley.edu/~mori/gimpy/gimpy.html o Accessibility issues: http://www.w3.org/TR/turingtest/ o In Sum: Do not recommend. III. Noncommercial Domain Name Holders A. ICANN must recognize that the purpose of Whois originally was identification of domain owners for purposes of solving technical problems. The purpose was not to provide law enforcement or other self-policing interests with a means of circumventing normal due process requirements for access to contact information. B. NCUC does not believe it is possible to develop technical mechanisms that can restrict port 43 or port 80 access only to a specific type of purpose; e.g., "nonmarketing uses." Access restrictions imposed by TF1 will inevitably apply to any whois user regardless of purpose. Moreover, restricting Port 43 access while leaving Port 80 open will only drive the automated processes to Port 80. C. Therefore we question whether TF1 can achieve anything of value. Task force should refrain from making judgments about the legitimacy of, justifications for, or "need" for any non-marketing uses. It is outside the scope of TF1 to make any such determinations. D. Automated scripts or programs using port 43 are effectively a substitute for bulk access. A policy determination on port 43 access is best made in conjunction with a determination on bulk access. E. Fifth, the best way to stop abuse of ports 43 or 80 is to get data that is valuable to spammers out of the public Whois database. [TASK FORCE 2] F. Changes to Port 43 are not a substitute for privacy protection. Attachment:
TF1 statement v1.doc
|