Wednesday, October 19, 2016 - 11:17am
Hereby we announce the release of DBpedia 2016-04. The new release is based on updated Wikipedia dumps dating from March/April 2016 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology.
During the latest DBpedia meeting in Leipzig we discussed about ways to support DBpedia and what benefits this support would bring. For the next two months, we are aiming to raise money to support the hosting of the main services and the next DBpedia release (especially to shorten release intervals). On top of that we need to buy a new server to host DBpedia Spotlight that was so generously hosted so far by third parties. If you use DBpedia and want us to keep going forward, we kindly invite you to donate here or become a member of the DBpedia association.
The English version of the DBpedia knowledge base currently describes 6.0M entities of which 4.6M have abstracts, 1.53M have geo coordinates and 1.6M depictions. In total, 5.2M resources are classified in a consistent ontology, consisting of 1.5M persons, 810K places (including 505K populated places), 490K works (including 135K music albums, 106K films and 20K video games), 275K organizations (including 67K companies and 53K educational institutions), 301K species and 5K diseases. The total number of resources in English DBpedia is 16.9M that, besides the 6.0M resources, includes 1.7M skos concepts (categories), 7.3M redirect pages, 260K disambiguation pages and 1.7M intermediate nodes.
Altogether the DBpedia 2016-04 release consists of 9.5 billion (2015-10: 8.8 billion) pieces of information (RDF triples) out of which 1.3 billion (2015-10: 1.1 billion) were extracted from the English edition of Wikipedia, 5.0 billion (2015-04: 4.4 billion) were extracted from other language editions and 3.2 billion (2015-10: 3.2 billion) from DBpedia Commons and Wikidata. In general, we observed a growth in mapping-based statements of about 2%.
The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:
- 754 classes (DBpedia 2015-10: 739)
- 1,103 object properties (DBpedia 2015-10: 1,099)
- 1,608 datatype properties (DBpedia 2015-10: 1,596)
- 132 specialized datatype properties (DBpedia 2015-10: 132)
- 410 owl:equivalentClass and 221 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 407 – 221)
The editor community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2016-04 extraction, we used a total of 5800 template mappings (DBpedia 2015-10: 5553 mappings). For the second time the top language, gauged by the number of mappings, is Dutch (646 mappings), followed by the English community (604 mappings).
- In addition to normalized datasets to English DBpedia (en-uris) we additionally provide normalized datasets based on the DBpedia Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for the upcoming fusion process with wikidata. The DBw-based uris will be the only ones provided from the following releases on.
- We now filter out triples from the Raw Infobox Extractor that are already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x> dbp:birthPlace|dbp:placeOfBirth|… <z>” in the same resource. These triples are now moved to the “infobox-properties-mapped” datasets and not loaded on the main endpoint. See issue 22 for more details.
- Major improvements in our citation extraction. See here for more details.
- We incorporated the statistical distribution approach of Heiko Paulheim in creating type statements automatically and providing them as an additional datasets (instance_types_sdtyped_dbo).
In case you missed it, what we changed in the previous release (2015-10):
- English DBpedia switched to IRIs. This can be a breaking change to some applications that need to change their stored DBpedia resource URIs / links. We provide the “uri-same-as-iri” dataset for English to ease the transition.
- The instance-types dataset is now split into two files: instance-types (containing only direct types) and instance-types-transitive containing the transitive types of a resource based on the DBpedia ontology
- The mappingbased-properties file is now split into three (3) files:
- “geo-coordinates-mappingbased” that contains the coordinated originating from the mappings wiki. the “geo-coordinates” continues to provide the coordinates originating from the GeoExtractor
- “mappingbased-literals” that contains mapping based fact with literal values
- “mappingbased-objects” that contains mapping based fact with object values
- the “mappingbased-objects-disjoint-[domain|range]” are facts that are filtered out from the “mappingbased-objects” datasets as errors but are still provided
- We added a new extractor for citation data that provides two files:
- citation links: linking resources to citations
- citation data: trying to get additional data from citations. This is a quite interesting dataset but we need help to clean it up
- All datasets are available in .ttl and .tql serialization (nt, nq dataset were neglected for reasons of redundancy and server capacity).
- Dataset normalization: We are going to normalize datasets based on wikidata uris and no longer on the English language edition, as a prerequisite to finally start the fusion process with wikidata.
- RML Integration: Wouter Maroy did already provide the necessary groundwork for switching the mappings wiki to a RML based approach on Github. We are not there yet but this is at the top of our list of changes.
- Starting with the next release we are adding datasets with NIF annotations of the abstracts (as we already provided those for the 2015-04 release). We will eventually extend the NIF annotation dataset to cover the whole Wikipedia article of a resource.
- SDTypes: We extended the coverage of the automatically created type statements (instance_types_sdtyped_dbo) to English, German and Dutch (see above).
- Extensions: In the extension folder (2016-04/ext) we provide two new datasets, both are to be considered in an experimental state:
- DBpedia World Facts: This dataset is authored by the DBpedia association itself. It lists all countries, all currencies in use and (most) languages spoken in the world as well as how these concepts relate to each other (spoken in, primary language etc.) and useful properties like iso codes (ontology diagram). This Dataset extends the very useful LEXVO dataset with facts from DBpedia and the CIA Factbook. Please report any error or suggestions in regard to this dataset to Markus.
- Lector Facts: This experimental dataset was provided by Matteo Cannaviccio and demonstrates his approach to generating facts by using common sequences of words (i.e. phrases) that are frequently used to describe instances of binary relations in a text. We are looking into using this approach as a regular extraction step. It would be helpful to get some feedback from you.
Lots of thanks to
- Markus Freudenberg (University of Leipzig / DBpedia Association) for taking over the whole release process and creating the revamped download & statistics pages.
- Dimitris Kontokostas (University of Leipzig / DBpedia Association) for conveying his considerable knowledge of the extraction and release process.
- All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
- The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
- Heiko Paulheim (University of Mannheim) for providing the necessary code for his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements. Which is now part of the DIEF.
- Václav Zeman, Thomas Klieger and the whole LHD team (University of Prague) for their contribution of additional DBpedia types
- Marco Fossati (FBK) for contributing the DBTax types
- Alan Meehan (TCD) for performing a big external link cleanup
- Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
- Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services.
- OpenLink Software (http://www.openlinksw.com/) collectively for providing the SPARQL Query Services and Linked Open Data publishing infrastructure for DBpedia in addition to their continuous infrastructure support.
- Ruben Verborgh from Ghent University – iMinds for publishing the dataset as Triple Pattern Fragments, and iMinds for sponsoring DBpedia’s Triple Pattern Fragments server.
- Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata dataset.
- Vladimir Alexiev (Ontotext) for leading a successful mapping and ontology clean up effort.
- All the GSoC students and mentors which directly or indirectly influenced the DBpedia release
- Special thanks to members of the DBpedia Association, the AKSW and the department for Business Information Systems of the University of Leipzig.
The work on the DBpedia 2016-04 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering (http://aligned-project.eu/). More information about DBpedia is found at http://dbpedia.org as well as in the new overview article about the project available at http://wiki.dbpedia.org/Publications.
Have fun with the new DBpedia 2016-04 release!
Tuesday, October 18, 2016 - 10:35am
Very shortly after the largest DBpedia meeting to date we are crossing Atlantic for the second time. We are happy to announce that the 8th DBpedia Community Meeting will be held in Sunnyvale on October 27th 2016, hosted by Yahoo.
Please read below on different ways you can participate. We are looking forward to meeting again in person with the US-based DBpedia community.
- Web URL: http://wiki.dbpedia.org/meetings/California2016
- Hashtag: #DBpediaCA
- When: October 27th, 2016
- Where: Yahoo (building E), 701 First Avenue, Sunnyvale, CA.
- Host: Yahoo
- Call for Contribution: Submit your proposal in our form
- Registration: through eventbrite (limited seats)
If you would like to become a sponsor for the 8th DBpedia Meeting, please contact the DBpedia Association.
|Yahoo!||For hosting the meeting and the catering|
|Google Summer of Code 2016||Amazing program and the reason some of our core DBpedia devs are visiting California|
|ALIGNED – Software and Data Engineering||For funding the development of DBpedia as a project use-case and covering part of the travel cost|
|Institute for Applied Informatics||For supporting the DBpedia Association|
|OpenLink Software||For continuous hosting of the main DBpedia Endpoint|
- Nicolas Torzec, Yahoo Knowledge Graph.
- Pablo N. Mendes, Lattice Data Inc.
- Dimitris Kontokostas, DBpedia Association and AKSW, Uni Leipzig
- Sebastian Hellmann, DBpedia Association and AKSW, Uni Leipzig
Attending the DBpedia Community meeting is free of charge, but seats are limited. Make sure to register to reserve a seat.
Call for Contribution
Please submit your proposal through our form. Contribution proposals may include (but are not limited to) presentations, demos, lightning talks, panels and session suggestions. We intend to accept as many proposals as possible in the available meeting time.
The meeting will take place at the Yahoo headquarters in Sunnyvale. Address: Yahoo! (Building E, 701 First Avenue, Sunnyvale, CA)
Your DBpedia Association
Friday, September 30, 2016 - 4:40pm
After the success of the last two community meetings in Palo Alto and in The Hague we thought it is time to meet in Leipzig, where the DBpedia Association is located. During the SEMANTiCS 2016 in Leipzig, Sep 12-15, the DBpedia community met on the 15th of September. First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community, the University of Leipzig for hosting our meeting and many thanks to the SEMANTiCS for hosting and sponsoring the meeting.
During the opening session, Lydia Pintscher, product manager of Wikidata, presented Wikidata: bringing structured data to Wikipedia with 16000 volunteers. Lydia described similarities and varieties between DBpedia and Wikidata and she talked about prospective steps for Wikidata. Harald Sack from the Hasso-Plattner-Institut spoke during the opening session, too. He introduced the dwerft Project – DBpedia and Linked Data for the Media Value Chaintopics which aims the common technology platform »Linked Production Data Cloud«.
The DBpedia showcase session started with the DBpedia 2016-04 release update by Markus Freudenberg (AKSW/KILT). At this session, six speakers presented how to utilize DBpedia in novel and interesting ways. For example:
- Miel Vander Sande (iMinds) talked about DBpedia Archives as Memento with Triple Pattern Fragments.
- Jörn Hees (DFKI) introduced us to Human associations in the Semantic Web and DBpedia.
- Peter de Laat from GoUnitive urged the community to personalize user interaction in a Linked Data environment.
DBpedia Association hour
The 7th edition of the community meeting covered the first DBpedia Association hour, which provided a platform for the community to discuss and give feedback. Sebastian Hellmann (AKSW, KILT), Julia Holze (DBpedia Association) and Dimitris Kontokostas (AKSW, KILT) gave an update on the DBpedia Association status. We talked about our technical progress, DBpedia funding and visions. Sebastian Hellmann introduced the Board of Trustees, which is the main decision-making body of the DBpedia Association and oversees the association and its work as its ultimate corporate authority.
Enno Meijers (KB) of the Dutch DBpedia chapter announced a successful cooperation between Huygens ING, iMinds/Univ. Gent, Vrije Universiteit Amsterdam, Institute for Sound and Vision, Koninklijke Bibliotheek (KB) and the NL-DBpedia community. By signing the Manifest of Understanding (MoU) they support the goals of the DBpedia Association officially and strengthen the Dutch chapter and community.
You will find community feedback and all questions which we discussed at the first DBpedia Association hour here: https://pad.okfn.org/p/how-to-improve-DBpedia. Participants who wanted to learn DBpedia basics joined the DBpedia tutorial session by Markus Freudenberg (AKSW/KILT).
The sessions in the afternoon highlighted two important fields of research and development, namely DBpedia ontology and DBpedia & NLP. At the DBpedia ontology session, Wouter Maroy (iMinds) presented DBpedia RML mappings, which he created during this year’s Google Summer of Code project and Gerard Kuys (Ordina) discussed the question ‘Does extraction prelude structure?’ with the DBpedia ontology group. At the same time, Milan Dojchinovski (AKSW/KILT) chaired the DBpedia & NLP session with eight very interesting talks. You will find all presentations given during this session on our website. The last two presentations Analyzing and improving the Polish Wikipedia Citations (part of the Wikipedia References & Citations challenge) and Greek DBpedia updates were given by Krzysztof Węcel (Poznan University) and Sotiris Karampatakis (OKF Greece).
On the closing session we wrapped up the meeting and gave out our prizes to:
- The “DBpedia Excellence in Engineering” went to Markus Freudenberg for keeping up with the DBpedia releases
- The “Citations Challenge prize” went to Krzysztof Węcel for his very thorough citation analysis.
Summing up, the event brought together more than 150 DBpedians from Europe which engaged in vital conversations about interesting projects and approaches to questions/problems revolving around DBpedia. We would like to thank the organizers Magnus Knuth (HPI, DBpedia German & Commons), Monika Solanki (University of Oxford) and representatives of the DBpedia Association such as Dimitris Kontokostas, Sebastian Hellmann and Julia Holze for devoting their time to the organization of the meeting and the program.
We are now looking forward to the 8th DBpedia Community Meeting (which most probably coming sooner than you think across the Atlantic). Check our website for further updates or follow #DBpedia on twitter.
Your DBpedia Association.
Wednesday, September 7, 2016 - 2:48pm
During the SEMANTiCS 2016 in Leipzig, Sep 12-15, the DBpedia community will get together on the 15th of September for the 7th edition of the DBpedia Community Meeting. The meeting will take place at the University of Leipzig (Augustusplatz 10, 04109 Leipzig, Germany). See here for detailed directions.
Over 140 participants registered for the next DBpedia Community Meeting, only few seats are left. So come and get your ticket to be part of this event.
The 7th edition of this event covers the first DBpedia Association hour, which provide a platform for the community to discuss and give feedback. On top we will have a DBpedia showcase session on DBpedia+ Data Stack 2016-04 – Release and talks about Human associations in the Semantic Web and DBpedia, DBpedia Archives as Memento with Triple Pattern Fragments and Towards a Unified PageRank for DBpedia and Wikidata. Our event features a dev & tutorial session to learn about DBpedia as well as a DBpedia ontology session and a DBpedia & NLP session.
Lydia Pintscher, product manager of Wikidata will speak about Wikidata: bringing structured data to Wikipedia with 16000 volunteers and Harald Sack from the Hasso-Plattner-Institut will speak about the dwerft Project – DBpedia and Linked Data for the Media Value Chaintopics. At the end of the meeting there will be a session for the “DBpedia references and citations challenge”, submissions will be judged by the Organizing Committee and the best two will receive a prize.
Attending the DBpedia Community meeting is free, but you need to register here. Optionally, in case you like to support DBpedia with a little more than your presence during the event, you can choose a DBpedia support ticket. Have a look here:
We would like to thank the following organizations for sponsoring and supporting our endeavour.
- University of Leipzig (http://www.uni-leipzig.de/)
- ALIGNED Project (http://aligned-project.eu/)
- Institute for Applied Informatics (InfAI, http://infai.org/en/AboutInfAI)
- OpenLink Software (http://www.openlinksw.com/)
- SEMANTICS Conference Sep 12-15, 2016 in Leipzig (http://2016.semantics.cc/)
Your DBpedia Association
Friday, June 24, 2016 - 4:38pm
Following our successful meetings in Europe & US our next DBpedia meeting will be held at Leipzig on September 15th, co-located with SEMANTiCS.
* Highlights *
– Keynote by Lydia Pintscher, Wikidata
– A session for the “DBpedia references and citations challenge”
– A session on DBpedia ontology by members of the DBpedia ontology committee
– Tell us what cool things you do with DBpedia: https://goo.gl/AieceU
– As always, there will be tutorials to learn about DBpedia
* Quick facts *
– Web URL: http://wiki.dbpedia.org/meetings/Leipzig2016
– Hashtag: #DBpediaLeipzig
– When: September 15th, 2016
– Where: University of Leipzig, Augustusplatz 10, 04109 Leipzig
– Call for Contribution: submission form
– Registration: Free to participate but only through registration (Option for DBpedia support tickets)
* Sponsors and Acknowledgments *
– Institute for Applied Informatics (InfAI)
– SEMANTICS Conference (Sep 12-15, 2016 in Leipzig)
If you would like to become a sponsor for the 7th DBpedia Meeting, please contact the DBpedia Association (firstname.lastname@example.org).
* Organisation *
– Magnus Knuth, HPI, DBpedia German/Commons
– Monika Solanki, University of Oxford, DBpedia Ontology
– Julia Holze, DBpedia Association
– Dimitris Kontokostas, AKSW/KILT, DBpedia Association
– Sebastian Hellmann, AKSW/KILT, DBpedia Association
Your DBpedia Association
Tuesday, June 7, 2016 - 12:04pm
In the latest release (2015-10) DBpedia started exploring the citation and reference data from Wikipedia and we were pleasantly surprised by the rich data we managed to extract.
This data holds huge potential, especially for the Wikidata challenge of providing a reference source for every statement. It describes not only a lot of bibliographical data, but also a lot of web pages and many other sources around the web.
The data we extract at the moment is quite raw and can be improved in many different ways. Some of the potential improvements are:
- Extend the citation extractor to handle other Wikipedia language editions; currently only English Wikipedia is supported.
- Map the data to a relevant Bibliographic ontology (there are many candidates and, although BIBO got most votes, we are open to other ontologies)
- Map the data to existing Bibliographic LOD (eg TEL has 100M records, Worldcat 300M) or online books (eg Google Books). See the citationIri issue.
- Ways to merge / fuse identical citations from multiple articles
- Use the citation data in the Wikidata primary sources tool
- Surprise us with your ideas!
We welcome contributions that improve the existing citation dataset in any way; and we are open to collaboration and helping. Results will be presented at the next DBpedia meeting: 15 September 2016 in Leipzig, co-located with SEMANTiCS 2016. Each participant should submit a short description of his/her contribution by Monday 12 September 2016 and present his/her work at the meeting. Comments, questions can be posted on the DBpedia discussion & developer lists or in our new DBpedia ideas page.
Submissions will be judged by the Organizing Committee and the best two will receive a prize.
- Vladimir Alexiev, Ontotext and DBpedia BG
- Anastasia Dimou, Ghent University, iMinds
- Dimitris Kontokostas, KILT/AKSW, DBpedia Association
Your DBpedia Association
Thursday, June 2, 2016 - 9:14am
DBpedia will be part of the 19th International Conference on Business Information Systems (6-8 July 2016) at the University of Leipzig. The conference addresses a wide scientific community and experts involved in the development of business computing applications.The three-day conference program is a mix of workshops, tutorials and paper sessions. Following, you will find more information about the DBpedia tutorial:
Wednesday, July 6th, 2016
DBpedia Tutorial on Semantic Knowledge Integration in established Data (IT) Environments
Enriching data with a semantic layer and linking entities is key to what is loosely called Smart Data. An easy, yet comprehensive way of achieving this is the use of Linked Data standards.
In this DBpedia tutorial, we will introduce
- the basic ideas of Linked Data and other Semantic Web standards
- existing open datasets that can be freely reused (including DBpedia of course)
- software and services in the DBpedia infrastructure such as the DBpedia SPARQL service, the lookup service and the DBpedia Spotlight Entity Linking service
- common business use cases that will help to apply the learned lessons into practice
- integration example into a hypothetical environment
In particular, we would like to show how to seamlessly integrate Linked Data technologies into existing IT- and data-environments and discuss how to link private corporate data knowledge graphs to DBpedia and Linked Open Data. Another special focus is on finding links in text and unstructured data.
2 x 90 minutes (half day)
- Practitioners that would like to learn about linked data and take home the know-how to apply it in their organisation
- Researchers and students that would like to use linked data in their research
The tutorial is held by core members of the DBpedia Association and members of the AKSW/KILT research group in the context of three large research projects:
Your DBpedia Association
Tuesday, April 26, 2016 - 10:44am
DBpedia participated for a fourth time in the Google summer of code program. This was a quite competitive year (like every year) where more than fourty students applied for a DBpedia project. In the end, 8 great students from all around the world were selected and will work on their projects during the summer. Here’s a detailed list of the projects:
A Hybrid Classifier/Rule-based Event Extractor for DBpedia Proposal by Vincent Bohlen
In modern times the amount of information published on the internet is growing to an immeasurable extent. Humans are no longer able to gather all the available information by hand but are more and more dependent on machines collecting relevant information automatically. This is why automatic information extraction and in especially automatic event extraction is important. In this project I will implement a system for event extraction using Classification and Rule-based Event Extraction. The underlying data for both approaches will be identical. I will gather wikipedia articles and perform a variety of NLP tasks on the extracted texts. First I will annotate the named entities in the text using named entity recognition performed by DBpedia Spotlight. Additionally I will annotate the text with Frame Semantics using FrameNet frames. I will then use the collected information, i.e. frames, entities, entity types, with the aforementioned two different methods to decide if the collection is an event or not. Mentor: Marco Fossati (SpazioDati)
Automatic mappings extraction by Aditya Nambiar
DBpedia currently maintains a mapping between Wikipedia info-box properties to the DBpedia ontology, since several similar templates exist to describe the same type of info-boxes. The aim of the project is to enrich the existing mapping and possibly correct the incorrect mapping’s using Wikidata.
Several wikipedia pages use Wikidata values directly in their infoboxes. Hence by using the mapping between Wikidata properties and DBpedia Ontology classes along with the info-box data across several such wiki pages we can collect several such mappings. The first phase of the project revolves around using various such wikipedia templates , finding their usages across the wikipedia pages and extracting as many mappings as possible.
In the second half of the project we use machine learning techniques to take care of any accidental / outlier usage of Wikidata mappings in wikipedia. At the end of the project we will be able to obtain a correct set of mapping which we can use to enrich the existing mapping. Mentor: Markus Freudenberg (AKSW/KILT)
Combining DBpedia and Topic Modelling by wojtuch
DBpedia, a crowd- and open-sourced community project extracting the content from Wikipedia, stores this information in a huge RDF graph. DBpedia Spotlight is a tool which delivers the DBpedia resources that are being mentioned in the document.
Using DBpedia Spotlight to extract Named Entities from Wikipedia articles and then applying a topic modelling algorithm (e.g. LDA) with URIs of DBpedia resources as features would result in a model, which is capable of describing the documents with the proportions of the topics covering them. But because the topics are also represented by DBpedia URIs, this approach could result in a novel RDF hierarchy and ontology with insights for further analysis of the emerged subgraphs.
The direct implication and first application scenario for this project would be utilizing the inference engine in DBpedia Spotlight, as an additional step after the document has been annotated and predicting its topic coverage. Mentor: Alexandru Todor (FU Berlin)
DBpedia Lookup Improvements by Kunal.Jha
DBpedia is one of the most extensive and most widely used knowledge base in over 125 languages. DBpedia Lookup is a tool that allows The DBpedia Lookup is a web service that allows users to obtain various DBpedia URIs for a given label (keywords/anchor texts). The service provides two different types of search APIs, namely, Keyword Search and Prefix Search. The lookup service currently returns the query results in XML (default) and JSON formats and works on English language. It is based on a Lucene Index providing a weighted label lookup, which combines string similarity with a relevance ranking in order to find the most relevant matches for a given label. As a part of the GSOC 2016, I propose to implement improvisations with an intention to make the system more efficient and versatile. Mentor: Axel Ngonga (AKSW)
This project aims at finding mappings between the classes (eg. dbo:Person, dbo:City) in the DBpedia ontology and infobox templates on pages of Wikipedia resources using machine learning. Mentor: Nilesh Chakraborty (University of Bonn)
This project is about integrating RML in the Dbpedia extraction framework. Dbpedia is derived from Wikipedia infoboxes using the extraction framework and mappings defined using the wikitext syntax. A next step would be replacing the wikitext defined mappings with RML. To accomplish this, adjustments will have to be made to the extraction framework. Mentor: Dimitris Kontokostas (AKSW/KILT)
The List Extractor by FedBai
The project focuses on the extraction of relevant but hidden data which lies inside lists in Wikipedia pages. The information is unstructured and thus cannot be easily used to form semantic statements and be integrated in the DBpedia ontology. Hence, the main task consists in creating a tool which can take one or more Wikipedia pages with lists within as an input and then construct appropriate mappings to be inserted in a DBpedia dataset. The extractor must prove to work well on a given domain and to have the ability to be expanded to reach generalization. Mentor: Marco Fossati (SpazioDati)
The Table Extractor by s.papalini
Wikipedia is full of data hidden in tables. The aim of this project is to exploring the possibilities of take advantage of all the data represented with the appearance of tables in Wiki pages, in order to populate the different versions of DBpedia with new data of interest. The Table Extractor has to be the engine of this data “revolution”: it would achieve the final purpose of extract the semi structured data from all those tables now scattered in most of the Wiki pages. Mentor: Marco Fossati (SpazioDati)
At the begining of September 2016 you will receive news about successfull Google Summer of Code 2016 student projects. Stay tuned and follow us on facebook, twitter or visit our website for the latest news.
Your DBpedia Association
Friday, April 1, 2016 - 10:42am
We proudly present our new 2015-10 DBpedia release, which is abailable now via: http://dbpedia.org/sparql. Go an check it out!
This DBpedia release is based on updated Wikipedia dumps dating from October 2015 featuring a significantly expanded base of information as well as richer and cleaner data based on the DBpedia ontology.
So, what did we do?
The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses
- 739 classes (DBpedia 2015-04: 735)
- 1,099 properties with reference values (a/k/a object properties) (DBpedia 2015-04: 1,098)
- 1,596 properties with typed literal values (a/k/a datatype properties) (DBpedia 2015-04: 1,583)
- 132 specialized datatype properties (DBpedia 2015-04: 132)
- 407 owl:equivalentClass and 222 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 408 and 200, respectively)
The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10 extraction, we used a total of 5553 template mappings (DBpedia 2015-04: 4317 mappings). For the first time the top language, gauged by number of mappings, is Dutch (606 mappings), surpassing the English community (600 mappings).
And what are the (breaking) changes ?
- English DBpedia switched to IRIs from URIs.
- The instance-types dataset is now split to two files:
- “instance-types” contains only direct types.
- “Instance-types-transitive” contains transitive types.
- The “mappingbased-properties” file is now split into three (3) files:
- “mappingbased-literals” contains mapping based statements with literal values.
- We added a new extractor for citation data.
- All datasets are available in .ttl and .tql serialization
- We are providing DBpedia as a Docker image.
- From now on, we provide extensive dataset metadata by adding DataIDs for all extracted languages to the respective language directories.
- In addition, we revamped the dataset table on the download-page. It’s created dynamically based on the DataID of all languages. Likewise, the tables on the statistics- page are now based on files providing information about all mapping languages.
- From now on, we also include the original Wikipedia dump files(‘pages_articles.xml.bz2’) alongside the extracted datasets.
- A complete changelog can always be found in the git log.
And what about the numbers?
Altogether the new DBpedia 2015-10 release consists of 8.8 billion (2015-04: 6.9 billion) pieces of information (RDF triples) out of which 1.1 billion (2015-04: 737 million) were extracted from the English edition of Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other language editions, and 3.2 billion (2015-04: 2.4 billion) came from DBpedia Commons and Wikidata. In general we observed a significant growth in raw infobox and mapping-based statements of close to 10%. Thorough statistics are available via the Statistics page.
And what’s up next?
We will be working to move away from the mappings wiki but we will have at least one more mapping sprint. Moreover, we have some cool ideas for GSOC this year. Additional mentors are more than welcome.
And who is to blame for the new release?
We want to thank all editors that contributed to the DBpedia ontology mappings via the Mappings Wiki, all the GSoC students and mentors working directly or indirectly on the DBpedia release and the whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
Special thanks go to Markus Freudenberg and Dimitris Kontokostas (University of Leipzig), Volha Bryl (University of Mannheim / Springer), Heiko Paulheim (University of Mannheim), Václav Zeman and the whole LHD team (University of Prague), Marco Fossati (FBK), Alan Meehan (TCD), Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy), Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software), OpenLink Software (http://www.openlinksw.com/), Ruben Verborgh from Ghent University – iMinds, Ali Ismayilov (University of Bonn), Vladimir Alexiev (Ontotext) and members of the DBpedia Association, the AKSW and the department for Business Information Systems of the University of Leipzig for their committment in putting tremendous time and effort to get this done.
The work on the DBpedia 2015-10 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering (http://aligned-project.eu/).
Have fun and all the best!
Have you backlinked your data yet? – A retrospective of the 6th DBpedia community meeting in The Hague
Wednesday, March 23, 2016 - 2:09pm
We thought it was about time to go orange again, meet the Dutch DBpedia Chapter and to meet and celebrate the growing dutch DBpedia community. Thus, following our successful US-event past November, the National Library of the Netherlands hosted the 6th DBpedia community meeting in The Hague on February 12th.
First and foremost, we would like to thank TNO for organizing the pre-event and the National Library of the Netherlands, especially Menno Rasch (Director of KB operations), for sponsoring the catering during the DBpedia community meeting.
Before diving into DBpedia topics, we had a welcome reception on February 11th with snacks and drinks at TNO – New Babylon. Around 40 people from the DBpedia community, members from TNO and its Data Science Department and representatives from the Platform Linked Data Netherlands engaged in vital exchanges about Linked Data topics in the Netherlands.
Sebastian Hellmann gave a short introduction about DBpedia and the recently found DBpedia Association. After Jean-Louis Roso talked about the TNO Data Science Department and current developments and projects, Erwin Folmer presented the platform Linked Data Netherlands (PiLOD).
A poster and demo session right after gave people from TNO the opportunity to present and discuss projects currently carried out at TNO.
Following, you find a short list of poster-presentation during the pre-event:
- The Smart Appliances REFerence ontology (SAREF)Standardization in IoT
- Linked Data in Horticulture
- GOOSE: Semantic search in Image Retrieval
- Logistics: Ontologies for the Physical Internet
- SWELL: Smart Reasoning for Well Being
The following social gathering with snacks and drinks, encouraged talks about current developments in the DBpedia community and about ongoing projects. According to TNO representative Laura Daniel, the pre-event was very successful. She summarized the evening of the welcome reception: “It was very inspiring to see the DBpedia community in action. There were lots of interesting projects that use DBpedia as well as lively discussions on the challenges faced by the community, and of course, the event was a great opportunity for networking!”
Following the pre-event, the main event attracted 95 participants and featured special session dedicated to the DBpedia showcases, the DBpedia ontology and challenges of DBpedia and Digital Heritage.
During the opening session, Menno Rasch, host of the meeting and Director of KB operations, highlighted the importance to raise awareness of the DBpedia brand in order to build a DBpedia community.
The newly found DBpedia Association and the related new charter regulating organizational issues in the DBpedia community was one of the focuses during the early morning hours, right before several interesting keynote presentations opened the discussion about DBpedia and its usage in the Netherlands.
Marco de Niet, representative of Digital Heritage Foundation (DEN Foundation), the Dutch knowledge centre for digital heritage, talked about “the National Strategy for Digital Heritage in the Netherlands”.
Marco Brattinga and Arjen Santema from the Land Registry and Mapping Agency (Kadaster) presented a framework to describe the data and metadata in registration in relation to a concept schema that describes what the registration is about. Apart from the ideas behind the framework, their presentation included a showcase of examples from the cadastral registration as well as the topographic map and the information node addresses and buildings.
The morning session was closes by Paul Groth, from Elsevier giving a presentation about knowledge graph construction and the Role of DBPedia and other Wikipedia based knowledge. He discussed the importance of structured data as key to coordinate data in order to build better taxonomies. He also pointed towards the importance of having an updated publicly available knowledge graph as a reference for constructing internal knowledge graphs.
After Lunch Track
DBpedia is one of the biggest and most important focal point of the Linked Open Data movement. Thus, the after-lunch-track focused very much on DBpedia Usages during the dedicated showcase session, which started with the new DBpedia & DBpedia+ Data Stack release (planned for 2016-04).
Afterwards, the session continued with further DBpedia related discussions, in which various practical DBpedia matters such as DBpedia in the EUROPEANA Food and Drink project, the use of DBpedia for improved vaccine information systems or using Elasticsearch + DBpedia to maintain a searchable database of global power plants were tackled.
The afternoon track came along with four DBpedia highlight-sessions, namely DBpedia and Ontologies, DBpedia and Heritage, DBpedia hands-on development and DBpedia and NLP. Firstly, the DBpedia ontology group discussed possible ontology usages and presented the results of the latest DBpedia Ontology survey. In the following 75 minutes during the DBpedia and Heritage session, special challenges and opportunities of reference data for digital heritage were addressed by experts from EUROPEANA, iMinds, RCE and KB, the National Library of the Netherlands. Thirdly, members of the DBpedia Association and the AKSW/KILT group from Leipzig led a practical session for developers and DBpedia enthusiasts to talk about technical issues and challenges in DBpedia as well as they held a Tutorial session for DBpedia Newbies.
The end of the event was dedicated to NLP and the application of Linked Data on Language Technologies, especially entity linking, topics which are of vital importance for the research of AKSW/KILT members at the University of Leipzig.
Following, you find a list of all presentations given during the meeting.
- Sebastian Hellmann, DBpedia Association AKSW/KILT – Have you Backlinked your Data yet?
- Marco de Niet, DEN Foundation – Digital Heritage in the Netherlands
- Marco Brattinga and Arjen Santema, Land Registry and Mapping Agency (Kadaster) – Keynote #1:
- Paul Groth, Elsevier – Knowledge Graph Construction and the Role of DBPedia
- Antoine Isaac, Europeana – Enriching Cultural Heritage Data with DBpedia
- Patrik Schneider, Siemens and WU Wien – DBpedia Wayback Machine
- Richard Nagelmaeker, – BlueSky – Knowledge Diviner – DBpedia demo
- Laura Daniele, TNO – GOOSE
- Christina Unger, CITEC – DBlexipedia: A nucleus for a multilingual lexical Semantic Web
- Raphael Boyer, DBpedia FR / INRIA – DBpedia Historic data
- Chris Davis – Using Elasticsearch + DBpedia to maintain a searchable database of global power plants
- Ali Khalili – Linked Data Reactor
- Vladimir Alexiev, Ontotext – Using DBPedia in Europeana Food and Drink
- Nilesh Chakraborty, AKSW/KILT – FREME – Open Framework of e-Services for Multilingual and Semantic Enrichment of Digital Content.
- Monika Solanki,University of Oxford – Using DBpedia for improved Vaccine Information Systems
- Ralph Schäfermeier and Alexandru Todor, FU Berlin – WebProtégé demo & aspect oriented programming
- Gerard Kuys / Ordina – Classification Ontology
- Vladimir Alexiev / Ontotext – DBpedia mappings quality problems
- Enno Meijers, Dutch DBpedia – DBpedia & Heritage: Challenges and opportunities of reference data for digital heritage
- Hugo Manguinhas, Europeana – Building an ecosystem of networked references
- Anastasia Dimou, iMinds – RML – generating high quality Linked Data
- Joop Vanderheiden, RCE – Histograph: geocoding places of the pas
- Olaf Janssen,KB – Illegal newspapers in the WOII” Wikipedia/DBpedia project
- Christina Unger, CITEC – Towards a Linguistic Linked Data Ecosystem (Results of the LIDER project)
- Giuseppe Futia: TellMeFirst – TellMeFirst A Knowledge Domain Discovery Framework
- Chris Davis – Mapping the Bio-economy using DBpedia Spotlight
Summing up, the 6th community meeting brought together more than 95 DBpedia enthusiast from the Netherlands and Europe which engaged in vital conversations about interesting projects and approaches to questions/problems revolving around DBpedia, not only during the dedicated session but also during networking breaks. The recently found DBpedia Association was strongly represented with presentations from Sebastian Hellmann, Dimitris Kontokostas, Nilesh Chakraborty, as well as Markus Freudenberg.
Finally, we would like to thank the organizers Enno Meijers, Richard Nagelmaker, Gerald Wildenbeest, Gerard Kuys, Monika Solanki and representatives of the DBpedia Association such as Dimitris Kontokostas and Sebastian Hellmann for devoting their time to the organization of the meeting and the programme. We are now looking forward to the 7th DBpedia Community Meeting, which will be held in the city of Leipzig again, during the Semantics conference in September 15th, 2016.