There’s No Such Thing as a “Permanent” Local Data Record

A common misperception in the local search and data industry is that publishers have a single “permanent” or “master” record for a given business listing. The misconception goes deeper: many people think that claiming a business causes a publisher to update a permanent record for that listing. However, if publishers have a single permanent record, […]

Yext

Oct 17, 2013

9 min

A common misperception in the local search and data industry is that publishers have a single "permanent" or "master" record for a given business listing. The misconception goes deeper: many people think that claiming a business causes a publisher to update a permanent record for that listing.

However, if publishers have a single permanent record, how come so many people have problems updating listings? It's because there's actually no master record.

Publishers consider many different signals and sources when deciding which information to show in their experiences for a particular business. Claims are just one source of many.

Let's take a look at how local data works at most of the world's major publishers. It's actually pretty complicated.

We'll start with a simple example of a fictional local search publisher called Bingo that uses 5 local data sources: an aggregator, a government data file, results from crawling the web, claims, and consumer signal.
Source
Aggregator
Government Data File
Web Crawl Results
Claims
Consumer Signal

Bingo stores all the local data from each of these 5 sources. For each data element, the sources are ranked by "trust". Here are Bingo's rankings for 5 elements: Name, Address, Phone, Web Site, and whether the business still exists:
Source
Name
Address
Phone
Web Site
Open
Aggregator
5
2
3
4
4
Government Data File
4
3
1
2
3
Web Crawl Results
3
4
2
3
5
Claims
1
1
5
1
2
Consumer Signal
2
5
4
4
1

Periodically – let's say weekly – Bingo runs a process to conflate these 5 sources to build a current "view". That is, the local data file that actually appears on their live site.

Let's say the conflation process is running to decide what info to show for a local business called PizzaLand.

Here's the data that's contained in each source about PizzaLand:
Source
PizzaLand
Aggregator
PizzaLand
212-123-0000
44 Broad Street
New York, NY 10011
www.pizzaland.com
Gov't Data File
Pizzaland, Inc
44 Broad Street Suite #75
New York, NY 10011
212-123-0000
Incorporation Date: 4/30/2013
Web Crawl Results
PizzaLand
800-321-1234
44 Broad Street
New York, NY 10011
www.pizzaland.com
Claims
PizzaLand
44 Broad Street
New York, NY 10011
888-331-3110
www.pizzaland.com
Claim date: 5/14/13
Consumer Signal
1 user marked business as closed on 5/17/13

The bolded elements are those that are ranked highest and will therefore show in the next generated view.

(Side note: To simplify the illustration, I've skipped the first step, which is to actually match up the PizzaLand locations across each source. This is actually super hard! What if there's no consistency identifying elements across sources to match them up? Or what if there are multiple PizzaLand locations in a source? Entire companies like Locationary have built sophisticated matching technology to handle this.)

Assuming Bingo's algorithm can actually match up these locations correctly, the next challenge: which data sources do they choose to actually show in the view on their live site?

Most publishers use a ranking algorithm that includes factors like source quality and recency of update to determine which fields to show. Typically this happens at the element level – so a phone number could come from one listing, whereas an address from another.

In this case, Bingo ranks claims that have occurred in the past 3 months as the trusted source for name and address, but since a lot of businesses use tracked phone numbers in their claim (which Bingo tries to avoid), they rank the government source highest for phone number.

But, a consumer marked it as closed! Should Bingo trust a single consumer? Maybe it's PizzaLand's arch nemesis down the street who marketed it as closed. Bingo's algo does not consider if a single user vote is sufficient to mark a location as closed.

So the final output in the view might look something like:

PizzaLand
44 Broad Street
New York NY
212-123-0000
www.pizzaland.com

As data in the sources change, or Bingo tweaks their trust rank, whenever Bingo runs their process, the info that appears in the live view changes as well.

A great advantage of this approach for a publisher is that they can easily pull in new sources (or remove sources) and rebuild their view without an archeological dig.

But the key point is that "claims" are not the permanent record. They are just another source of many, often ranked highly in the beginning, but losing trust quickly over time. Claims are a tricky business for publishers. Many of them come from brand new businesses. Brand new businesses fail at an astounding high rate. And how many of them notify publishers when they close? Basically none. So, for publishers, claims are a double-edged sword. Primarily, they are used as a lead generation source for their local sales efforts.

Furthermore, this is a vastly oversimplified example. In the real world, publishers take in hundreds of sources. They deal with many duplicate listings. They deal with closed locations. They deal with fake claims. They deal with constantly changing data. They re-rank sources.

Simply put, the entire process is a mess, which is why we invented Yext – an overlay on top of the madness.

An Overlay isn't a Problem. It's the Solution.

"Yext is not a permanent solution" or "Yext is just an overlay", critics say. These critics are 100% correct about one thing – Yext is an overlay. But this is by design. A trusted overlay is exactly how you solve the madness.

Historically, to manage a business's local data, experts have advocated a "spray and pray" approach. The strategy behind this approach is that, since the public has no real idea which sources any given publisher uses, and no idea how those sources are ranked, the best idea is to simply "spray" your local data to every known aggregator, update your web site, claim your business, file with all gov't agencies, etc. Then you "pray" that you guessed every source a publisher uses, that their matching process works, and there is no idiosyncrasy that causes your listings to show wrong data.

But with an estimated 20% of searches returning wrong data, and numerous complaints rampant throughout the industry, it's pretty obvious there's a huge problem here.

For clarity, I do not fault the experts for advocating this approach. In the past, it has been the only logical approach. But Yext has invented a better way: an overlay.

Going back to our prior example, let's say Bingo is in the Yext PowerListings Network and accepts local data from the Yext Cloud. And PizzaLand signs up for a PowerListings subscription.

As an overlay, the Yext data source is ranked highest for every element. It short-circuits the rest of Bingo's process. Here's the rank by element:
Source
Name
Address
Phone
Web Site
Open
Yext
1
1
1
1
1
Aggregator
6
3
4
5
5
Gov't Data File
5
4
2
3
4
Web Crawl Results
4
5
3
4
6
Claims
2
2
6
2
3
Consumer Signal
3
6
5
5
2

Here's how the data from Yext looks in their Bingo's source database waiting for their next build:
Source
PizzaLand Location
Yext
PizzaLand
212-123-0000
www.pizzaland.com
44 Broad Street
New York, NY 10011
www.pizzaland.com
Aggregator
PizzaLand
212-123-0000
44 Broad Street
New York, NY 10011
www.pizzaland.com
Gov't Data File
Pizzaland, Inc
44 Broad Street Suite #75
New York, NY 10011
212-123-0000
Incorporation Date: 4/30/2013
Web Crawl Results
PizzaLand
800-321-1234
44 Broad Street
New York, NY 10011
www.pizzaland.com
Claims
PizzaLand
44 Broad Street
New York, NY 10011
888-331-3110
www.pizzaland.com
Claim date: 5/14/13
Consumer Signal
1 user marked business as closed on 5/17/13

When it's time for Bingo to work on the PizzaLand location, since PizzaLand appears in the Yext source, and Yext has the highest trust for every element, all the data Yext supplies shows up in the view. It doesn't matter if the data is present or not in the other sources. It doesn't matter if Bingo didn't match up PizzaLand's locations correctly when running their local data build. It doesn't matter if a rogue consumer marks something as closed. PizzaLand's data in Yext shows up in the live view.

The overlay approach wins. The "spray and pray" approach is not necessary because as long as the business maintains an active subscription with Yext, the data in the other sources is not important.

When a business leaves Yext, we don't delete their listing. They are simply no longer active in our cloud and so the overlay no longer short-circuits a publisher's data compilation process. It's back to the "spray-and-pray" approach. With few exceptions, usually whatever was happening before starts happening again.

An Active Subscriptions Proves Existence and Ownership

We created Yext to put businesses in control of their own data, give publishers a trusted source of local data, and to get users the right data in their local searches.

The authoritative objective data about a business is known by the business itself. So, the key for a publisher is to make sure the business actually is real (existence) and that they are receiving information about a business from the authoritative source (ownership).

The best way to knock out both of these goals is with an active, paid subscription from the business itself or an agent of the business. A reasonable, ongoing paid subscription proves continued existence of a business. It proves ownership. It eliminates fraud.

A "claim", even when properly validated, is insufficient to solve the existence and ownership problem. It solves the problem at the exact point in time when the claim is completed. But what happens if the business changes owners? Or moves completely? Or runs out of business – which a huge percentage of claims do.

I'm not trying to make a moral argument that businesses should have to pay for their listings. Rather, I'm saying that an ongoing subscription fee to keep listings updated solves a huge structural problem in the industry by proving continued existence and ownership.

Conclusions

Publishers don't have a master database of locations. Typically, they pull in hundreds of sources, which they store. They try to match locations across sources, rank sources at the element level, and periodically rebuild their dataset for their live search based on their trust levels. Claims are not a permanent record. They are just one source among many.

Any listings management requires ongoing work. Whether you do it manually or use Yext (or some combination of both), an active effort is required to keep ongoing existence and ownership of the business locations you're managing.

I will leave you with a controversial idea: I actually think Google could solve a lot of their problems by implementing a similar program to the Yext Cloud and charging a reasonable monthly amount for businesses who wish to directly control their data. Who wouldn't want to pay Google a bit every month to guarantee that their listings were up to date? If a user reported something different, this could initiate a challenge for the business to respond to.

In this way, businesses would have control, Google would have continued proof of existence and ownership, and most importantly, end users would always find the right info.

Share this Article

Read Next

loading icon