Re: [ga] Whois Accuracy Study Launches
- To: Debbie Garside <debbie@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [ga] Whois Accuracy Study Launches
- From: kent <kent@xxxxxxxxx>
- Date: Wed, 10 Jun 2009 12:12:26 -0700
On Wed, Jun 10, 2009 at 06:19:11PM +0100, Debbie Garside wrote:
> After a very quick read of this document (and I do mean quick so maybe I
> have misunderstood the proposed methodology), I am perhaps a little worried
> about the sampling method/size/group for this proposed survey.
> In my experience one needs to have a sample size of at least 400 in order to
> be able to glean anything sensible from the data. Personally I would like to
> see sample sizes of 800 which I believe would give +/-3.3% margin of error
> at the 95% confidence level. It is widely thought that sample sizes above
> 1200 result in few data variations so 800 is middle of the road if my memory
> serves correct. Although the overall sample size exceeds 1200 I think the
> survey should be split into 10-15 separate surveys of at least 400 domain
> names for any useful information to be gleaned from the countries
That would be *enormously* more expensive, given the stringent verification
methodology (from page 9):
1. The address given is a valid postal address, as specified in the
2. The entity named as the registrant is independently associated with
the address given; that is, there is some evidence other than the WHOIS
entry that an entity of that name can be contacted at the address given,
3. The registrant, once contacted using independently obtained contact
information, acknowledges that they are the registrant of the domain
name, and (if needed given the similarity between many domain names)
recognizes the description of the web page associated with the domain
There is a great deal of "on the internet no one knows your a dog"
uncertainty -- this study is the first one I have ever seen that will
actually address that issue head on, and I'm really interested to see what
the results will be :-)
> As this proposed methodology would seem to focus on samples linked to
> Countries, perhaps it would be better to choose a number of countries from
> the 5 continents and then choose a large enough sample size for each country
> to give statistically valid information. I would recommend a sample size of
> at least 400 and perhaps 2 or 3 countries from each continent. It might be
> interesting to choose the most and least successful in terms of internet
> usage/domain name registration with an additional one representing an
> average country's use or the choice of country could be based on GDP.
> It is my opinion that the stratified sample as proposed which results in the
> analysis of Whois information for 3 domain names from one country will tell
> you absolutely nothing about the accuracy of that country's Whois
> information as a whole.
Per country accuracy statistics are not the goal of the study, only
That is, after the study you will be able to say things like "10% of all
whois data from .com and .net is 'obviously bogus', for our specific
definition of 'obviously bogus' and for our accuracy criteria". But you
won't be able to say "10% of whois information for com/net domains registered
in Brazil is bogus", or "whois accuracy is better in England than it is in
The goals and limitations of the study are very carefully spelled out in the
document. The purpose of the stratified sample design is to minimize
sampling costs, not to get per-country statistics.
> But the last time I did a nationwide survey was in 2002 (Opportunity Wales
> State of the Nation) so I am a little rusty on all this.
NORC does studies like this on an ongoing basis -- it's what they do.