ICANN/GNSO GNSO Email List Archives


<<< Chronological Index >>>    <<< Thread Index >>>

[dow1tf] Re: WHOIS and Internet research

  • To: dow1tf@xxxxxxxxxxxxxx
  • Subject: [dow1tf] Re: WHOIS and Internet research
  • From: Wendy Seltzer <wendy@xxxxxxxxxxx>
  • Date: Tue, 23 Dec 2003 06:38:44 -0800
  • Sender: owner-dow1tf@xxxxxxxxxxxxxx

I asked Ben Edelman, a Fellow at the Berkman Center, about use of WHOIS data in his Internet research and how that research would be affected by loss of automated access. His projects have included studies of registrant compliance with TLD restriction policies; patterns of bulk registration; and accuracy of registrant data. See <http://cyber.law.harvard.edu/edelman.html> . I pass along his full response below. He also includes some URLs for other Internet researchers.


Others doing this kind of work: <http://www.zooknic.com>,
<http://www.registrarstats.com>, <http://www.sotd.info>,
<http://www.dailychanges.com>.  But I don't think these folks use Whois
data, for obvious reasons (e.g. they can't get it).

The rise of GIF and other restricted access to HTTP Whois, when combined with disabling of port 43 Whois, essentially puts an end to researchers' ability to do the kind of work posted to <http://cyber.law.harvard.edu/tlds/001/#topreg>, <http://cyber.law.harvard.edu/people/edelman/name-restrictions/>, and <http://cyber.law.harvard.edu/people/edelman/dotus>. Indeed, since HTTP Whois is already pretty well locked down and since some registrars have already disabled port 43 (contrary to policy, to be sure), this research has already become sufficiently impractical that I generally don't spend the time trying at this point.

I agree that an end to machine-readable on-demand data favors those with
bulk licenses of Whois data, e.g. T&T and MarkMonitor, as well as those who
can afford to hire these companies.  This preference comes at the expense of
independent data analysis folks who surely don't have $10k per registrar per
year.  My intuition is that the result will be higher prices from T&T and
MM, probably along with reduced innovation.  Certainly there will be fewer
independent reports as to the sorts of domain name phenomenon that can only
be discovered using Whois data.

Some possible policy responses:

1) Require that port 43 Whois remain available to bona fide researchers.
Policy could require that researchers demonstrate an affiliation with an
accredited university or organization -- or not, depending on desired
treatment of independent bona fide researchers.  (I would think most, but
not all, will hold some institutional affiliation.)  Access could be
provided on a by-IP basis or on a passworded basis.  This is probably not
unduly burdensome on registrars because I gather registrars will continue
operating port 43 Whois servers for their own internal purposes (authorizing
transfers, etc.).

2) Require that bulk Whois data be made available to bona fide researchers
on preferential terms (i.e. considerably less than $10k/registrar-year max
currently specified in registrar contracts with ICANN).  Given the sensitive
nature of this data, it would still be important to make recipients sign
contracts as to use and distribution of the data.  But to facilitate
meaningful research, the contract shouldn't prohibit partial republication
of the sort linked above.

3) Require that bulk Whois data and/or port 43 Whois data be made available
to some notion of "startup" Whois searching services that are unable to pay
the higher fees.  Fees could be a function of these companies' initial
revenues, perhaps, or fees could be indexed to registrars' actual marginal
costs of distributing the data to licensees (e.g. bytes downloaded from FTP

-- Wendy Seltzer -- wendy@xxxxxxxxxxx Staff Attorney, Electronic Frontier Foundation Fellow, Berkman Center for Internet & Society at Harvard Law School http://cyber.law.harvard.edu/seltzer.html Chilling Effects: http://www.chillingeffects.org/

<<< Chronological Index >>>    <<< Thread Index >>>