There are a couple of “inpage” meta data formats; mostly created by a big online service that needs to index millions+ of pages each. I have [not a publishable article grade] notes from 2010, but this is too old. By the end of writing these notes, I am getting blurry about the differences between these specs (obviously aside from namespaces, and small schema changes; they are all in identical problem domain, and very similar).

Tech specs

For a lot of research like this; I start by defining what success looks like; here is an example 1. According to 2, there are 34 published micro-format specs. The three most widely used ones are calendar 3, hcard 4 and vcard 5. I use calendar a lot in work email; but never have on a website. Quite a few on that list don't seem to be used on the web; but maybe are on the internet instead. This is a list of every meta data format available 6. I don't have any measure of the relevance of this population analysis, but 7 states open graph and twitter cards are the most used formats (data taken in Feb 2021). It could be an analysis of everyone selling tea; so less relevant to other market sectors.

Schema.org is a large invention for meta data 8 9 examples 10; first public release was 2012 11. To use business language, it is the technical solution for “rich snippets”, the terse summary of a webpage for search engine results pages. It is quite a simple concept borrowing preceding XML infrastructure; and is mostly self documenting (à la other XML systems). Those first two links cover everything from a technical perspective. Like many “systems like XML”, schema.org is extendable; some people use FAQ structures 12. The home page states that this markup is supposed to be used in a wide variety of places; and some customer analysis 13 from 2017, quote “ACM Queue put the figure at 31.3%, while a study by Bing and Catalyst found that just 17% of marketers use Schema.org markup.".

A currently widely used meta data format is “LD JSON” 14, the current edition for the RFC is 15. There are some clear examples 16. The goal of LD JSON to allow easy access to LD or linked data 17. Linked data was initiated by Dr T Berners-Lee; with the objective of relating this document (and author etc) to other pages on the internet. This technology shares a solution space with LINK elements. Google advertise a technology to allow addition of LD JSON to each of your webpages 18; but the notes make it look like a “data entry hell”. Assuming that you are using Wordpress (very common platform), there are specific integrations 19 20 21 22. If you are business person; who uses technology, without liking it, read 23.

OGP ~ open graph protocol 24 25, released by Facebook inc in 2010 (as far as this is a tool to allow other parties to give data to FB, FB would have published as soon as it was completed). This 26 explains the “open graph” aspect. Again very similar to other protocols 27. According to 28 all OGP should be applied to META elements, not other tags.

RDFa was started in 2004; a public face is 29 and is has a spec published by W3C 30, some less technical notes 31. As with all of these formats; the objective to to have computers understand the semantics of a document made for humans better 32. An RDF sample 33 34.

There is a website called micro-formats which claims that TIME and ARTICLE elements are a micro-format. I think this is a stretch of the definition 35. I do use these, for their respective semantic purpose, but I don't think they are a micro-format more than input or table elements. Marking “time content” in TIME elements is recommended, so it can be localised; some fairly old example code 36. A more modern approach would probably use whatever JS framework you are using for the rest of the app.

In “ye olden days”, there was a HTML3 tag called meta; which contained data for early spiders. This is not really used anymore (I have backup char encoding tag, and obviously the TITLE tag is still current) ; but for completeness 37 38 39 40 41. This link 42 is a list of things that google search pays attention to. As a retired technology, not interesting.

To support their advertising better, Google offer 43 44 tools. They state the first link is going to be shut down soon. I find the output of the tools similar to older versions of G++ (the C++ compiler e.g. you have an error somewhere before line 3464)

Search engines 45 46

As far as I know, G+ is not a currently running service/SaaS; however there was (in 2009) a lot of integration of G+ features into other tools; so supporting any meta data for G+ may be still be profitable 47 48. The get pages into google search, use the LD JSON and schema.org described else where 49 50 51.
There is custom meta data for gmail 52 53 54 which looks like standard schema.org. This reference is very long, but scroll to the section on gmail, it is quite thorough 55.
When a user creates an event or place in gmap; it helps if a very similar range of meta data is applied 56

Bing use Schema.org, JSON-LD and RDFa 57 58

I can't find a reference detailing what meta data DuckDuckGo 59 uses. I would be quite surprised if it wasn't the same as every other vendor.

Yandex 60 supplies docs for popular metadata formats 61 62; and lists meta tags 63, but I can't see where it mentions more recent micro-formats.

Yahoo is an old search engine 64 but according to the wiki is currently using Bing technology (and previously Google technology).

I suspect if I read technical Mandarin, the Baidu 65 search would have docs like google does. As a note for Europeans, try 66

Dogpile 67 68 is a search I have never used. Apparently it is a meta indexer; but doesn't publish many docs. If it buys most of its indexes, this would not surprise me.

Alltheweb is a minority search, which I used in around 2002; as it didn't send the searches that you used back to the search server (noticeable speed improvement on my DSL/ DNS setup at the time). Today this no longer exists 69.

Social platform

Twitter have been using micro-formats for a longtime. The first edition of the markup 70 is from 2006, about 5 years older than most of these projects. Probably at the same time as the release of bootstrap 71; Twitter started to process CSS class names as meta data 72. Twitter markup 73

Recently linkedin is trying to avoid any external links; so doesn't care about micro-formats at all.

Reddit as a platform doesn't seem to scrape data from webpages, that are injected as links. The links are presented as bald hyperlinks. The reddit “howtos” talk about manual image uploading, if you want a branding image. Some sources state that reddit does read OGP data 74.

When you post a link into an FB wall, FB extract OGP data to decorate the link. Facebook inc have published some dev docs 75 76 77. FB also use hCalendar protocol. One gets better presentation, if the there is OGP data for the “preferred snippet image”. Platforms that read OGP (FB, reddit), need particular handling for images, in order to get the best results (as would be expected). A recipe from 2014 is 78.

Some similar articles in research

Some similar articles in uitools