[dow1tf] RE: TF1
- To: "Neuman, Jeff" <Jeff.Neuman@xxxxxxxxxx>, "'stahura@xxxxxxxx'" <stahura@xxxxxxxx>, "'dow1tf@xxxxxxxxxxxxxx'" <dow1tf@xxxxxxxxxxxxxx>
- Subject: [dow1tf] RE: TF1
- From: "Neuman, Jeff" <Jeff.Neuman@xxxxxxxxxx>
- Date: Mon, 22 Dec 2003 22:09:04 -0500
- Sender: owner-dow1tf@xxxxxxxxxxxxxx
This message is from Paul Stahura from Enom.
Glen, can we make sure that Paul is included on the Task Force e-mails?
From: Paul Stahura [mailto:stahura@xxxxxxxx]
Sent: Monday, December 22, 2003 8:06 PM
To: 'Neuman, Jeff'
Jeff, I do not seem to get any TF list messages.
I have to go to the dnso website to see the messages.
Am I on the email list?
I want to send the following message:
On the last call we were asked to generate list of entities that use whois
so that we can contact them to get list of need and justifications (from
Other homework items included generating questions to ask them and providing
a list of methods to prevent data mining.
Here is my quick start for each of the 3 lists (I'm sure other TF members
will have more examples).
Example Entity to reach out to
My guess at a possible need/justification
Company profile information
Thomson and Thomson
Monitor whois changes service
Name Intelligence (whois.sc)
Statistical info gathering, public whois service
Disseminate whois info to the public
Disseminate whois info to the public
I'm not sure, but could as a public whois service
Possibly as a way to attract domain name registrations or as a public
Improve search rankings
List of some possible questions to ask them:
1) Do you obtain the information via port-43 or via the web or via bulk
licenses (or some other way that is probably indirect)?
2) Do you disseminate the obtained whois information via port-43, or
via the web, or some other method, or not at all?
3) Do you store the information?
a. no, or just cached, or for longer periods
b. If 3a is not "no", then, approximately what percentage of all the
whois for all gTLD names do you have stored at any given moment?
4) Provide a short justification, or need, for the whois data.
I've written a (probably) non-exhaustive list of some techniques to prevent
data mining. Each has advantages and disadvantages.
To help understand the list, I've written this quick layman's summary of the
output types of whois data:
There are typically two types of whois data output. They are "port-43" and
"web-based". Port-43 is typically used to communicate information from one
computer to another and web-based is typically used to communicate whois
information from a computer to a human. Yes, it is possible for a human to
get information directly from port-43 without using a web browser and for a
computer to get information via the web (port-80), but that is not typically
the case. Port-43 output is in ASCII-character format and is typically
easily computer readable, while web-based output can be in a variety of
formats and is becoming increasingly difficult for computers to read/access
(in an effort to prevent mining see 2.a.ii below), while at the same time
remain easy for a human to read.
How to prevent mining (list of methods without listing advantages and
1. Limit the rate of information flow
a. For example, by limiting the number of queries per day from a
particular IP address (web-based and port-43 outputs)
b. By somehow invoking a cost to the requestor
2. Limit the usefulness of the information (usually to automated
programs, not to humans)
a. Due to its output format
port-43, for example, by changing the format of the output so that it
becomes more difficult for a computer to parse
web-based whois, for example, by
1. outputting the whois information as a gif so that it becomes more
difficult for a computer to parse (OCR would be required)
2. Test for human (see Scientific American article "Baffling the Bots"
b. Due to its timeliness
for example by periodically changing the outputted email addresses. For
example output a valid email address that forwards to the real address and
change the outputted address every few days, the old outputted email
addresses become useless to spammers.
3. Limit the magnitude and/or type of information outputted
a. For example, do not output any information at all or only, for
example, the admin contact organization name
4. Limit the "access" (not defined here, but could be flow, usefulness,
magnitude/type or a combination) to the information
a. By, for example, allowing only trusted parties access
b. Or allowing access to humans but not to computers by, for example
only transmitting the information in non-text formats such as gifs.