Sunday, 28 April 2013

Legal tech: Scraping for data


Scraping for data has been around for a while and even before the Internet big companies were compiling data on people.  Remember those Neilson boxes on top of television sets?  Or perhaps surveys that you receive in the mail about products you utilize?  Well scraping for data is now taking it up a notch.  Many companies regularly use data scraping on job applicants and on the competition.  Big business also utilizes the technique to learn about the buying habits of consumers.  With sophisticated software programs and their related powerful algorithms, billions of pages of data can be processed and evaluated within seconds and these spiders troll the Internet trying to learn as much about you as possible.

One company even filed a Patent on a new software program and business method for obtaining the real name of a person who uses a pseudonym online.  That company, PeekYou LLC out of New York developed an algorithm that checks gender, address information, date of birth, email cross references, phone numbers and other information to try and determine the real name of an online pseudonym.  So that wks1070@gmail.com may actually be John Smith from Chicago (I made up wks1070@gmail.com and John Smith so any similarity is purely coincidence).  This is a very interesting concept and I am sure will spark robust debate among lawyers and even ethical concerns.

So back to the original question-is it worth spending the money for data scraping on a case that is worth a large amount?  Is it ethically acceptable?  Trying to learn the identity of someone called wks1070@gmail.com seems ethically acceptable especially if that person is posting trade secret information stolen from your company or information that is false and the subject of a lawsuit.  Yes you can subpoena Google but what if the person only accesses that account from Starbucks?

I would be interested in any readers who have utilized any data scraping techniques, or utilized PeekYou.  I would like to write follow-up articles using real examples, minus the confidential details of course.  One thing is for certain the genie is out of the bottle on peoples’ data and it isn’t going back in anytime soon.

Source: iln.isba.org/2010/10/21/legal-tech-scraping-for-data

Note:


Roze Tailer is experienced web scraping consultant and writes articles on linkedin email scraping, linkedin profile scraping, tripadvisor data scraping, lawyers data scraping, yellowpages data scraping and product information scraping.

No comments:

Post a Comment