ICANN/GNSO GNSO Email List Archives


<<< Chronological Index >>>    <<< Thread Index >>>

RE: [ga] Whois Accuracy Study Launches

  • To: "'kent'" <kent@xxxxxxxxx>
  • Subject: RE: [ga] Whois Accuracy Study Launches
  • From: "Debbie Garside" <debbie@xxxxxxxxxxxxxxxxxx>
  • Date: Wed, 10 Jun 2009 21:35:23 +0100

Hi Kent

responses in line...

> > In my experience one needs to have a sample size of at least 400 in
> > order to be able to glean anything sensible from the data.
> Personally
> > I would like to see sample sizes of 800 which I believe would give
> > +/-3.3%  margin of error at the 95% confidence level. It is widely
> > thought that sample sizes above 1200 result in few data
> variations so
> > 800 is middle of the road if my memory serves correct.
> Although the
> > overall sample size exceeds 1200 I think the survey should be split
> > into 10-15 separate surveys of at least 400 domain names for any
> > useful information to be gleaned from the countries scrutinised.
> That would be *enormously* more expensive, given the
> stringent verification methodology (from page 9):
>     1.  The address given is a valid postal address, as
> specified in the
>     above definition
>     2.  The entity named as the registrant is independently
> associated with
>     the address given; that is, there is some evidence other
> than the WHOIS
>     entry that an entity of that name can be contacted at the
> address given,
>     and
>     3.  The registrant, once contacted using independently
> obtained contact
>     information, acknowledges that they are the registrant of
> the domain
>     name, and (if needed given the similarity between many
> domain names)
>     recognizes the description of the web page associated
> with the domain
>     name.

I don't think this is overly stringent and much of the work that is being
done e.g. sourcing of databases per country and running the list against
them for matches will be just as cheap to do for 400 or 800 as it is for 3.

In fact all of the companies I have dealt with in the past (admittedly only
in the UK but I have been doing this since 1988) have a minimum charge that
usually equates to the cost of processing 1,000 or 4,000 records.

> There is a great deal of "on the internet no one knows your a dog"
> uncertainty -- this study is the first one I have ever seen
> that will actually address that issue head on, and I'm really
> interested to see what the results will be :-)

I just think that more could be gleaned from this study that may prove
useful in policy making.  How inaccurate the Whois database is as a whole is
rather boring and useless information.  What does it matter whether it is
10% or 20% inaccurate or 3.3% for .com or 0.9% for .org (as identified by

If Russia has a higher level of .com inaccuracies than the UK what do you
think that would tell you?  If you link the level's of inaccuracies to GDP
data do you think you may find a trend?  Possible!  What does 3.3%
inaccuracies within .com's and 0.9% in .org's tell you?  Is there a
difference in the registration procedures for these or is it that there is
more scope for phishing with a .com?

> > As this proposed methodology would seem to focus on samples
> linked to
> > Countries, perhaps it would be better to choose a number of
> countries
> > from the 5 continents and then choose a large enough sample
> size for
> > each country to give statistically valid information. I would
> > recommend a sample size of at least 400 and perhaps 2 or 3
> countries
> > from each continent.  It might be interesting to choose the
> most and
> > least successful in terms of internet usage/domain name
> registration
> > with an additional one representing an average country's
> use or the choice of country could be based on GDP.
> >
> > It is my opinion that the stratified sample as proposed
> which results
> > in the analysis of Whois information for 3 domain names from one
> > country will tell you absolutely nothing about the accuracy of that
> > country's Whois information as a whole.
> Of course.
> Per country accuracy statistics are not the goal of the
> study, only overall accuracy.

But there does seem to be a lot of emphasis on information regarding country
profile within this methodology.

> That is, after the study you will be able to say things like
> "10% of all whois data from .com and .net is 'obviously
> bogus', for our specific definition of 'obviously bogus' and
> for our accuracy criteria".  But you won't be able to say
> "10% of whois information for com/net domains registered in
> Brazil is bogus", or "whois accuracy is better in England
> than it is in France."

What a shame... :-)

> The goals and limitations of the study are very carefully
> spelled out in the document.  The purpose of the stratified
> sample design is to minimize sampling costs, not to get
> per-country statistics.

In which case stick to studying Whois info in the US for the purposes of
this study (its easier and cheaper)and devise a more in-depth
strategy/methodology next time.
> > But the last time I did a nationwide survey was in 2002
> (Opportunity
> > Wales State of the Nation) so I am a little rusty on all this.
> NORC does studies like this on an ongoing basis -- it's what they do.

That's as may be but their web page is down so I can't check them out
www.norc.uchicago.edu/  The last University I worked with sent me 400 emails
asking me how to do their survey!  Undergraduates playing at real
business... don't you just hate them! ;-)  By the way the terminology for
the first 100 interviews conducted and the survey then being redesigned is
to conduct a "pilot".



> Kent

<<< Chronological Index >>>    <<< Thread Index >>>