Hedge funds are watching a key lawsuit involving LinkedIn to see if they can spend billions on web-scraped data

Advertisement
Hedge funds are watching a key lawsuit involving LinkedIn to see if they can spend billions on web-scraped data

Jeff weiner

Stephen Lam/Getty Images

A legal case involving Jeff Weiner's LinkedIn will determine the future for web-scraping.

Advertisement
  • Web scraping has become integral to many hedge funds' data-collection processes, with one out of every 20 webpage visits in 2018 coming from by a scraping bot run by a fund or sell-side research institution, according to a recent research report.
  • Hedge funds are expected to pay nearly $2 billion in 2020 on the collection and storage of data that had been scraped from the web, the report notes, but a lawsuit involving LinkedIn could change what funds are legally allowed to collect.

The internet is crawling with bots, and many of them belong to hedge funds. But a lawsuit involving LinkedIn could change what funds are legally allowed to collect.

One out every 20 website visit is from a fund or sell-side research firm that is scraping the page for info, according to a report from Opimas. Hedge funds have built the tremendous amount of data they scrape into their systems, and will spend nearly $2 billion on web-scraping alone in 2020, a sliver of the overall money that is pouring into the exploding alternative-data scene.

But the pedal-to-the-metal approach of scaling up and building out web-scraping units by hedge funds may be for naught as the courts try to create a framework for what is allowed on the web.

In August 2017, a judge in the Northern district of California ruled that public LinkedIn pages can be scraped by hiQ, a company that used the data trawled by their bots to inform employers about their employees' web activity, despite the fact LinkedIn's terms of use forbid the use of any web-scraping bot.

Advertisement

The case, which is currently in the appeals process in the 9th district, is being watched by lawyers and funds closely to determine what the future will be for an increasingly important part of hedge funds' investment process.

See more: Bloomberg is diving in to the booming alternative-data field with a new product that will help the market become mainstream

With the judge ruling against LinkedIn, funds are still ramping up their web-scraping for now, according to Peter Greene, vice chair of the investment management group at law firm Lowenstein Sandler.

"I don't think it is scaring folks from using it," Greene said of LinkedIn's call to stop scraping. The most significant change has come from hedge funds' compliance departments that are tracking litigation related to web-scraping and are generally more knowledgeable and sophisticated around digital data collection, he said.

A backlash has mounted over perceived invasions of privacy

But as a backlash has mounted over perceived invasions of privacy by tech companies like Facebook and Google, hedge funds need to be prepared to defend and possibly alter its data strategies if there is a sudden pullback on what is legally allowed to be scraped, lawyers say.

Advertisement

"The key is determing what does it mean to be public on the internet," Greene said.

Sign up here for our weekly newsletter Wall Street Insider, a behind-the-scenes look at the stories dominating banking, business, and big deals.

Just a couple years ago, he said, it was only the biggest firms that understood the ins-and-outs of the litigation around the space, but now it's "all sizes of managers."

And beyond the legal risk, the headline risk that comes with collecting and using large swatches of online data needs to be top-of-mind as well, said Stacey Brandenburg, a lawyer with ZwillGen, in a presentation at data company Quandl's annual conference last month.

Web-scraping in particular is an area where there are motivated third-parties - the websites - that are unhappy with the practice.

"When you're developing a web-scraping program, you want to think through and monitor carefully what your web-scrapes look like and how they are being responded to by a site so you could be on notice to the point where it is unequivocally clear that a site has revoked your authorization and doesn't want you to be there, because the next step is to send a cease-and-desist and potentially to sue you," Brandenburg said.

Advertisement

Fund managers are getting overwhelmed

A pullback on the amount of data that can be scraped could potentially be a good thing for managers that are being overwhelmed by the amount of information coming in, said Fidelity's head of artificial intelligence and advanced data John Avery at an industry conference earlier this year.

"If anything, I think folks are scraping more than they need," said Evan Reich, a data strategist at $20 billion BlueMountain Capital Management. He warned against overreliance on web-scraping data that isn't properly filtered.

See more: Hedge funds are spending billions to get an edge through access to satellite images and credit-card transactions. Now they fear a crackdown's coming.

If you do accidentally pull in information that has data that a hedge fund can't legally use - like personally identifiable credit card info - and build models using it, "then it may not be able to be purged," Reich said.

"Ideally you never want to have, 10 years down the road, someone says purge my data, and you'd rather it not be something that's impossible, or extremely difficult, to purge," Reich said.

Advertisement

"No dataset is so good that it is worth betting the firm on."

{{}}