EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #06966


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Experimental Schema.org support for EPrints


Hi all,

It's good to see this being discussed. Just a related aside... in lieu of schema.org support I used the Google Data Highlighter, as part of Search Console. It works well for repositories because they naturally display a lot of structured data on summary pages anyway. It works as a schema.org substitute and contributes data to the Google Knowledge Graph. Of course, Data Highlighter is no substitute for schema.org support because it only works for Google and therefore excludes other search agents, but it's quite a useful, low-barrier tool to notify Google of structured data.

Cheers

George

> -----Original Message-----
> From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-
> bounces@ecs.soton.ac.uk] On Behalf Of Denis Pitzalis
> Sent: 21 November 2017 17:07
> To: eprints-tech@ecs.soton.ac.uk
> Subject: Re: [EP-tech] Experimental Schema.org support for EPrints
> 
> Hi Christopher,
> 
> nice to see you are back! Concerning the Schema.org support I did something
> "custom" here:
> 
> http://en.unesco.org/mediabank
> 
> I think the proper way to go would be a plugin thought...
> 
> Denis
> 
> 
> On 21/11/2017 17:46, Christopher Gutteridge wrote:
> > Hi, EPrints-tech, long time no-see.
> >
> > I've recently rejoined the EPrints.soton.ac.uk support team, and was
> > asked about trying out schema.org support (which Google and Bing like).
> > I'm not a huge fan as I like peer-to-peer data, rather than via the
> > big search engines, but I gave it a go anyway.
> >
> > I have been working on a way to add schema.org support to EPrints.
> > It's using an invisible <div> which may not be everyone's preferred
> > way of doing it, but has the advantage of working well with the citation
> files.
> >
> > Other options would be to design the entire abstract page around this
> > feature (possible, but work to add to existing sites) or use JSON-LD
> > which is what I would do if I was doing it for just me, but making a
> > configuration file to generate JSON-LD would be more work for me and
> > more of a learning curve for the EPrints admin.
> >
> > I've added it as a pilot to https://eprints.soton.ac.uk/ (subject to
> > removal or change at any time)
> >
> > See the data extracted from a page here:
> > https://search.google.com/structured-data/testing-tool#url=https%3A%2F
> > %2Feprints.soton.ac.uk%2F50995%2F
> >
> > There's lots more work to polish this, but it's work showing off now.
> >
> > I've used 3 citation files for this. One outer  one to handle the
> > different types. This is a bit ugly but was the solution I came up
> > with, a second one to process fields that come in a standard install
> > of EPrints, and a third for the fields eprints.soton has customised heavily.
> >
> > In the main summary_page.xml I added:
> >
> >    <epc:print expr="$item.citation('schema_org')" />
> >
> > Which links to schema_org.xml:
> >
> > <?xml version="1.0" ?>
> > <!DOCTYPE html SYSTEM "entities.dtd" >
> >
> > <!--
> >      Full "abstract page" (or splash page or summary page, depending
> > on your jargon) for an eprint.
> > -->
> >
> > <cite:citation xmlns="http://www.w3.org/1999/xhtml";
> > xmlns:epc="http://eprints.org/ep3/control";
> > xmlns:cite="http://eprints.org/ep3/citation"; >
> >
> > <div style='display:none'>
> >    <epc:choose>
> >      <epc:when test="type = 'article'">
> >        <div itemscope="itemscope"
> > itemtype="http://schema.org/ScholarlyArticle";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <epc:when test="type = 'book'">
> >        <div itemscope="itemscope" itemtype="http://schema.org/Book";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <!-- book_section -->
> >      <epc:when test="type = 'conference_item'">
> >        <div itemscope="itemscope"
> > itemtype="http://schema.org/ScholarlyArticle";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <epc:when test="type = 'monograph'">
> >        <div itemscope="itemscope"
> > itemtype="http://schema.org/ScholarlyArticle";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <!-- patent -->
> >      <epc:when test="type = 'thesis'">
> >        <div itemscope="itemscope" itemtype="http://schema.org/Thesis";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <epc:when test="type = 'dataset'">
> >        <div itemscope="itemscope"
> > itemtype="http://schema.org/Dataset";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <!-- ad_item // art design item //  -->
> >      <epc:when test="type = 'mu_item'">
> >        <div itemscope="itemscope"
> > itemtype="http://schema.org/MusicComposition";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <!-- letter -->
> >      <!-- editorial -->
> >      <epc:when test="type = 'review'">
> >        <div itemscope="itemscope" itemtype="http://schema.org/Review";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >      <!-- special_issue -->
> >      <!-- meeting_abstract -->
> >      <!-- software // SoftwareApplication/ SoftwareSourceCode ?? -->
> >      <epc:when test="type = 'website'">
> >        <div itemscope="itemscope"
> > itemtype="http://schema.org/Website";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:when>
> >
> >      <epc:otherwise>
> >        <div itemscope="itemscope"
> > itemtype="http://schema.org/CreativeWork";>
> >          <epc:print expr="$item.citation('schema_org_main')" />
> >        </div>
> >      </epc:otherwise>
> >    </epc:choose>
> > </div>
> >
> > </cite:citation>
> >
> > Each of these options in turn links to the main one,
> > schama_org_main.xml, that uses default EPrints fields:
> >
> > <?xml version="1.0" ?>
> > <!DOCTYPE html SYSTEM "entities.dtd" >
> >
> > <cite:citation xmlns="http://www.w3.org/1999/xhtml";
> > xmlns:epc="http://eprints.org/ep3/control";
> > xmlns:cite="http://eprints.org/ep3/citation"; >
> >
> > <div itemprop="name"><epc:print expr="title" /></div> <div
> > itemprop="headline"><epc:print expr="title" /></div> <img
> > itemprop="image"
> > src="http://www.eprints.org/uk/wp-
> content/uploads/EprintsServices2015icon.jpg"
> > />
> > <epc:if test="abstract">
> >    <div itemprop="description"><epc:print expr="abstract" /></div>
> > </epc:if> <epc:if test="keywords">
> >    <div itemprop="keywords"><epc:print expr="keywords" /></div>
> > </epc:if> <epc:if test="isbn">
> >    <div itemprop="isbn"><epc:print expr="isbn" /></div> </epc:if>
> > <epc:if test="id_number">
> >    <div itemprop="identifier"><epc:print expr="id_number" /></div>
> > </epc:if>
> >
> > <epc:if test="issn or series">
> >    <div itemprop="isPartOf" itemscope="itemscope"
> > itemtype="http://schema.org/Periodical";>
> >      <epc:if test="issn"><div itemprop="issn"><epc:print expr="issn"
> > /></div></epc:if>
> >      <epc:if test="series"><div itemprop="name"><epc:print expr="series"
> > /></div></epc:if>
> >    </div>
> > </epc:if>
> >
> > <epc:comment>
> >    <!-- pageEnd and pageStart could go here but are more bother to
> > extract. --> </epc:comment>
> >
> > <epc:if test="pagerange">
> >    <div itemprop="pagination"><epc:print expr="as_string(pagerange)"
> > /></div>
> > </epc:if>
> > <epc:if test="publisher">
> >    <div itemprop="publisher" itemscope="itemscope"
> > itemtype="http://schema.org/Organization";>
> >      <div itemprop="name"><epc:print expr="publisher" /></div>
> >    </div>
> > </epc:if>
> > <epc:if test="official_url">
> >    <div itemprop="url"><epc:print expr="official_url" /></div>
> > </epc:if>
> >
> > <epc:if test="creators">
> >    <epc:foreach expr="creators" iterator="person">
> >      <div itemprop="creator" itemscope="itemscope"
> > itemtype="http://schema.org/Person";>
> >        <div itemprop="name"><epc:print
> > expr="$person.subproperty('name')" /></div>
> >        <epc:if test="$person.subproperty('id')">
> >          <div itemprop="identifier"><epc:print
> > expr="$person.subproperty('id')" /></div>
> >        </epc:if>
> >      </div>
> >    </epc:foreach>
> > </epc:if>
> > <epc:if test="editors">
> >    <epc:foreach expr="editors" iterator="person">
> >      <div itemprop="editor" itemscope="itemscope"
> > itemtype="http://schema.org/Person";>
> >        <div itemprop="name"><epc:print
> > expr="$person.subproperty('name')" /></div>
> >        <epc:if test="$person.subproperty('id')">
> >          <div itemprop="identifier"><epc:print
> > expr="$person.subproperty('id')" /></div>
> >        </epc:if>
> >      </div>
> >    </epc:foreach>
> > </epc:if>
> >
> > <epc:if test="corp_creators">
> >    <epc:foreach expr="corp_creators" iterator="org">
> >      <div itemprop="creator" itemscope="itemscope"
> > itemtype="http://schema.org/Organization";>
> >        <div itemprop="name"><epc:print
> > expr="$person.subproperty('name')" /></div>
> >      </div>
> >    </epc:foreach>
> > </epc:if>
> >
> >
> > <epc:comment>
> >    ADD IN LOCAL EXTENSIONS USING THIS FILE </epc:comment> <epc:print
> > expr="$item.citation('schema_org_lcoal')" />
> >
> > </cite:citation>
> >
> > Finally I created schema_org_local.xml for the fields like date and
> > creators which we've heavily messed around with.
> >
> > <?xml version="1.0" ?>
> > <!DOCTYPE html SYSTEM "entities.dtd" >
> >
> > <!--
> >      Local extra content for schema.org info on summary page.
> >
> >      This file can be used to add new fields that are not standard for
> > EPrints.
> > -->
> >
> > <cite:citation xmlns="http://www.w3.org/1999/xhtml";
> > xmlns:epc="http://eprints.org/ep3/control";
> > xmlns:cite="http://eprints.org/ep3/citation"; >
> >
> > <epc:if test="dates">
> >    <epc:foreach expr="dates" iterator="date">
> >      <epc:if test="$date.subproperty('date_type') = 'published'">
> >        <div itemprop="datePublished"><epc:print
> > expr="$date.subproperty('date')" /></div>
> >      </epc:if>
> >      <epc:if test="$date.subproperty('date_type') = 'completed'">
> >        <div itemprop="dateCompleted"><epc:print
> > expr="$date.subproperty('date')" /></div>
> >      </epc:if>
> >    </epc:foreach>
> > </epc:if>
> >
> > <epc:if test="contributors">
> >    <epc:foreach expr="contributors" iterator="person">
> >      <div itemprop="contributor" itemscope="itemscope"
> > itemtype="http://schema.org/Person";>
> >        <div itemprop="name"><epc:print
> > expr="$person.subproperty('name')" /></div>
> >        <epc:if test="$person.subproperty('id')">
> >          <div itemprop="identifier"><epc:print
> > expr="$person.subproperty('id')" /></div>
> >        </epc:if>
> >      </div>
> >    </epc:foreach>
> > </epc:if>
> >
> > </cite:citation>
> >
> >
> > I'm not sure how useful all this is but figured I'd throw it out there.
> > It uses a default image as for some reason the Google checker insisted.
> > It doesn't link to files or mention subjects, doesn't include URIs
> > properly and doesn't link to ORCID etc. (which is data we have in
> > eprints.soton).
> >
> >
> >
>