Blog

More than 140 DBpedia enthusiasts joined the Community Meeting in Amsterdam.

Friday, September 22, 2017 - 5:26pm

After the success of the last two community meetings in Sunnyvale and in Galway, we thought it is time to go Orange again. During the SEMANTiCS 2017 in Amsterdam, Sep 11-14, the DBpedia Community met on the 14th of September. First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community and many thanks to the Meervaart Theatre and the SEMANTiCS for hosting our community meeting.

picture by Andrea Volpini

Opening Session

Chris Welty

During the opening session, Chris Welty, Google Researcher, presented Even the Changes Are Changing: A New Age of Cognitive Computing. He introduced the impact and challenges of question answering & AI as well as the development of Jeopardy through technical changes. Victor de Boer from the VU University talked about Semantic Technology for Development: Semantic Web without the Web?. He demonstrated the use of semantic technology in the challenging technical environment of developing countries. Both talks illustrated the ever growing importance of semantic technology and AI each placed at opposite sites of the technology spectrum, from Raspberry PIs to High Performance Clusters.

 

Showcase Session

The DBpedia Showcase Session started with an interactive interview. Sebastian Hellmann (AKSW/KILT) talked with Jan-Bart de Vreede (Kennisnet, former member of the Wikimedia Foundation) about the challenges of growing an open community and creating a more formal structure. They discussed advantages, pitfalls and what lessons can be learned from other communities such as Wikimedia. Afterwards Markus Freudenberg (AKSW/KILT) introduced the highlights of the 2016-10 DBpedia Release.

At this session, five speakers presented how to utilize DBpedia in novel and interesting ways. Including:

  • Virtuoso 8 and Scalable Attributed-based Access Controls (ABAC) by Patrick van Kleef (Openlink Software)
  • Learning to Associate DBpedia Entities like Humans by Joern Hees (DFKI) (demo)
  • Towards Using UnifiedViews for Executing DBpedia Data Extraction and Curation Tasks by Tomas Knap (Semantic Web Company)
  • Sustainable Linked Data Generation: The Case of DBpedia by Wouter Maroy (imec)
  • Mappings UI by Ismael Rodríguez (Polytechnic University of Catalonia)
Wouter Maroy & Ismael Rodríguez

Parallel Session

As a regular part of the DBpedia Community Meeting, we had two parallel sessions in the afternoon where DBpedia newbies can learn about what DBpedia is and how to use the DBpedia datasets. Participants who wanted to learn DBpedia basics joined the tutorial session by Markus Freudenberg (DBpedia Release Manager). The DBpedia Association Hour provided a platform for the community to discuss the results of the DBpedia Strategy Survey 2017. This survey was prepared by Sören Auer and the DBpedia Board members to get to know what the DBpedia Community thinks about DBpedia’s strategic priorities and how the funds of the DBpedia Association should be spent. Even if 45 minutes were not adequate to review all survey questions, this session proved to be beneficial due to a really agile and dynamic discussion. A better cooperation and communication between the Association and the different national and language chapter is only one suitable key which was embraced by the community to facilitate problem solving and DBpedia’s organization.

Afternoon Track

The sessions in the afternoon highlighted two important fields of research and development, namely DBpedia Ontology and DBpedia & NLP. At the DBpedia Ontology Session, Gustavo Publio (AKSW/KILT) presented data quality issues in DBpedia and highlighted the challenges on redesign the DBpedia Ontology (slides).  Wouter Maroy (imec) and Ismael Rodríguez (Polytechnic University of Catalonia) showcased the DBpedia Mappings Front-End Administration, which they created during this year’s Google Summer of Code project. If you are interested in career opportunities at DBpedia, check out Wouter’s success story here.

Gustavo Publio

At the same time, Milan Dojchinovski (AKSW/KILT) chaired the DBpedia & NLP session with five very interesting talks. In the following you will find all presentations given during this session:

Dutch DBpedia Hour & Joint Workshop

Enno Meijers (National Library of the Netherlands) chaired the Dutch DBpedia Hour. In this open session members of the Dutch DBpedia Language Chapter discussed tasks and responsibilities for sustaining and developing the Dutch DBpedia as well as communication, technical infrastructure and content improvement of the DBpedia Dutch Language Chapter. The reference for this discussion was the tasks and responsibilities stated in the Memorandum of Understanding signed by Huygens ING, Koninklijke Bibliotheek, Vrije Universiteit Amsterdam, iMec and Beeld en Geluid. Outcome of this session was an agreement on the approach for creating an operational plan.

Simultaneously, DBpedia joint a session with the Workshop “Linked Data Quality Assessment and Improvement from Academia to Industry”. The presentations are available below:

Amit Kirschenbaum & Magnus Knuth

In the closing session, Sebastian Hellmann (AKSW/KILT) announced a new collaboration to strengthen the DBpedia NLP Department. Via videostream we talked with Mike Tung and Filipe Mesquita from diffbot, about NLP and the relation extraction from Wikipedia articles. If you are interested in the new collaboration, please check diffbot’s slides here.

All slides and presentations are also available on our Website and you will find more feedback and photos about the event on Twitter via #DBpediaAmsterdam17.

We would like to thank the DBpedia Dutch language chapter, especially Enno Meijers (National Library of the Netherlands), Lieke Verhelst (Linked Data Factory, Informagic), Victor de Boer (Vrije Universiteit Amsterdam), Roland Cornelissen (metamatter), Gerald Wildenbeest (Saxion), Gerard Kuys (Ordina), Maarten Brinkerink (The Netherlands Institute for Sound and Vision) as well as Julia Holze (DBpedia Association), Dimitris Kontokostas (DBpedia Chapter Coordinator) and Sebastian Hellmann (AKSW/KILT, DBpedia Association) for devoting their time to curating the program and organizing the meeting.

Special thanks go to Katharina Weissenberg and Anna Keil for supporting the meeting by taking pictures of the community and the event.

We are now looking forward to the 11th DBpedia Community Meeting which will be held on 12th of October 2017 in Cupertino, California. Visit our event page for further updates.

So, stay tuned and check Twitter, Facebook and the Website or subscribe to our Newsletter for latest news and updates.

See you soon!

Yours,

DBpedia Association

GSoC 2017 – Recap and Results

Tuesday, September 12, 2017 - 10:55am

We are very pleased to announce that all of this year’s Google Summer of Code students made it successful through the program and passed their projects. All codes have been submitted, merged and are ready to be examined by the rest of the world.

Marco Fossati, Dimitris Kontokostas, Tommaso Soru, Domenico Potena, Emanuele Storti , anastasia Dimiou, Wouter Maroy, Peng Xu, Sandro Coelho and Ricardo Usbeck, members of the DBpedia Community, did a great job in mentoring 7 students from around the world. All of the students enjoyed the experiences made during the program and will hopefully continue to contribute to DBpedia in the future.

“GSoC is the perfect opportunity to learn from experts, get to know new communities, design principles and work flows.” (Ram G Athreya)”

Now, we would like to take that opportunity to  give you a little recap of the projects mentored by DBpedia members during the past months. Just click below for more details .

 

[expander_maker id=”1″ more=”Read more” less=”Read less”]DBpedia Mappings Front-End Administration by Ismael Rodriguez

The goal of the project was to create a front-end application that provides a user-friendly interface so the DBPedia community can easily view, create and administrate DBpedia mapping rules using RML. The developed system includes user administration features, help posts, Github mappings synchronization, and rich RML related features such as syntax highlighting, RML code generation from templates, RML validation, extraction and statistics. Part of these features are possible thanks to the interaction with the DBPedia Extraction Framework. In the end, all the functionalities and goals that were required have been developed, with many functional tests and the approval of the DBpedia community. The system is ready for production deployment. For further information, please visit the project blog.  Mentors: Anastasia Dimou and Wouter Maroy (Ghent University), Dimitris Kontokostas (GeoPhy HQ).

Chatbot for DBpedia by Ram G Athreya

DBpedia Chatbot is a conversational chatbot for DBpedia which is accessible through the following platforms: a Web Interface, Slack and Facebook Messenger.

The bot is capable of responding to users in the form of simple short text messages or through more elaborate interactive messages. Users can communicate or respond to the bot through text and also through interactions (such as clicking on buttons/links). The bot tries to answer text based questions of the following types: natural language questions, location information, service checks, language chapters, templates and banter. For more information, please follow the link to the project site. Mentor: Ricardo Usbeck (AKSW).

Knowledge Base Embeddings for DBpedia by Nausheen Fatma

Knowledge base embeddings has been an active area of research. In recent years a lot of research work such as TransE, TransR, RESCAL, SSP, etc. has been done to get knowledge base embeddings. However none of these approaches have used DBpedia to validate their approach. In this project, I want to achieve the following tasks: i) Run the existing techniques for KB embeddings for standard datasets. ii) Create an equivalent standard dataset from DBpedia for evaluations. iii) Evaluate across domains. iv) Compare and Analyse the performance and consistency of various approaches for DBpedia dataset along with other standard datasets. v) Report any challenges that may come across implementing the approaches for DBpedia. For more information, please follow the links to her project blog and GitHub-repository. Mentors: Tommaso Soru (AKSW) and  Sandro Coelho (KILT).

Knowledge Base Embeddings for DBpedia by Akshay Jagatap

The project defined embeddings to represent classes, instances and properties by implementing Random Vector Accumulators with additional features in order to better encode the semantic information held by the Wikipedia corpus and DBpedia graphs. To test the quality of embeddings generated by the RVA, lexical memory vectors of locations were generated and tested on a modified subset of the Google Analogies Test Set. Check out further information via Akshay’s GitHub-repo. Mentors: Tommaso Soru (AKSW) and Xu Peng (University of Alberta).

The Table Extractor by Luca Vergili

Wikipedia is full of data hidden in tables. The aim of this project was to explore the possibilities of exploiting all the data represented with the appearance of tables in Wiki pages, in order to populate the different chapters of DBpedia through new data of interest. The Table Extractor has to be the engine of this data “revolution”: it would achieve the final purpose of extracting the semi structured data from all those tables now scattered in most of the Wiki pages. In this page you can observe dataset (english and italian) extracted using table extractor . Furthermore you can read log file created in order to see all operations made up for creating RDF triples. I recommend to also see this page, that contains the idea behind the project and an example of result extracted from log files and .ttl dataset. For more details see Luca’s Git-Hub repository. Mentors: Domenico Potena and Emanuele Storti (Università Politecnica delle Marche).

 

Unsupervised Learning of DBpedia Taxonomy by Shashank Motepalli

Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project tries to extract structured information from Wikipedia and transform it into RDF.

The main classification system of DBpedia depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. DBTax provides an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several NLP and interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users. details about his work and his code can e found on the projects site. Mentors: Marco Fossati (Università degli Studi di Trento) and Dimitris Kontokostas (GeoPhy HQ). 

The  Wikipedia List-Extractor by Krishanu Konar

This project aimed to augment upon the already existing list-extractor project by Federica in GSoC 2016. The project focused on the extraction of relevant but hidden data which lies inside lists in Wikipedia pages. Wikipedia, being the world’s largest encyclopedia, has humongous amount of information present in form of text. While key facts and figures are encapsulated in the resource’s infobox, and some detailed statistics are present in the form of tables, but there’s also a lot of data present in form of lists which are quite unstructured and hence its difficult to form into a semantic relationship. The main objective of the project was to create a tool that can extract information from Wikipedia lists and form appropriate RDF triplets that can be inserted in the DBpedia dataset. Fore details on the code and about the project check Krishanu’s blog and GitHub-repository. Mentors: Marco Fossati (Università degli Studi di Trento), Domenico Potena and Emanuele Storti (Università Politecnica delle Marche). [/expander_maker]

We are regularly growing our community through GSoC and can deliver more and more opportunities to you. Ideas and applications for the next edition of GSoC are very much welcome. Just contact us via email or check our website for details.

Again, DBpedia is planning to be a vital part of the GSoC Mentor Summit, from October 13th -15th, at the Google Campus in Sunnyvale California. This summit is a way to say thank you to the mentors for the great job they did during the program. Moreover it is a platform to discuss what can be done to improve GSoC and how to keep students involved in their communities post-GSoC.

And there is more good news to tell.  DBpedia wants to meet up with the US community during the 11th DBpedia Community Meeting in California.  We are currently working on the program and keep you posted as soon as registration is open.

So, stay tuned and check  Twitter, Facebook and the Website or subscribe to our Newsletter for latest news and updates.

See you soon!

Yours,

DBpedia Association

Career Opportunities at DBpedia – A Success Story

Thursday, August 24, 2017 - 3:53pm

Google summer of Code is a global program focused on introducing students to open source software development.

During the 3 months summer break from university, students work on a programming projects  with an open source organization, like DBpedia. 

We are part of this exciting program for more than 5 years now. Many exciting projects developed as results of intense coding during hot summers. Presenting you Wouter Maroy, who has been a GSoC student at GSoc 2016 and who is currently a mentor in this years program, we like to give you a glimpse behind the scenes and show you how important the program is to DBpedia.


Success Story: Wouter Maroy

Who are you?

I’m Wouter Maroy, a 23 years old Master’s student in Computer Science Engineering at Ghent University (Belgium). I’m affiliated with IDLab – imec. Linked Data and Big Data technologies are my two favorite fields of interest. Besides my passion for Computer Science, I like to travel, explore and look for adventures. I’m a student who enjoys his student life in Ghent.  

What is your main interest in DBpedia and what was your motivation to apply for a DBpedia project at GSoC 2016.

I took courses during my Bachelors with lectures about Linked Data and the Semantic Web which of course included DBpedia; it’s an interesting research field. Before my GSoC 2016 application I did some work on Semantic Web technologies and on a technology (RML) that was required for a GSoC 2016 project that was listed by DBpedia. I wanted to get involved in Open Source and DBpedia, so I applied.

What did you do?

DBpedia has used a custom mapping language up until now to generate structured data from raw data from Wikipedia infoboxes. A next step was to improve this process to a more modular and generic approach that leads to higher quality linked data generation . This new approach relied on the integration of RML, the RDF Mapping Language and was the goal of the GSoC 2016 project I applied for. Understanding all the necessary details about the GSoC project required some effort and research before I started with coding. I also had to learn a new programming language (Scala). I had good assistance from my mentors so this turned out very well in the end.  DBpedia’s Extraction Framework, which is used for extracting structured data from Wikipedia, has a quite large codebase. It was the first project of this size I was involved in. I learned a lot from reading its codebase and from contributing by writing code during these months.

Dimitris Kontokostas and Anastasia Dimou were my two mentors. They guided me well throughout the project. I interacted daily with them through Slack and each week we had a conference call to discuss the project.  After many months of research, coding and discussing we managed to deliver a working prototype at the end of the project. The work we did was presented in Leipzig on the DBpedia day during SEMANTICS 16’. Additionally, this work will also be presented at ISWC 2017.

You can check out his project here.

How do you currently contribute to improve DBpedia?  

I’m mentoring a GSoC17 project together with Dimitris Kontokostas and Anastasia Dimou as a follow up on the work that was done by our GSoC 2016 project last year. Ismael Rodriguez is the new student who is participating in the project and he already delivered great work! Besides being a mentor for GSoC 2017, I make sure that the integration of RML into DBpedia is going into the right direction in general (managing, coding,…). For this reason, I worked at the KILT/DBpedia office in Leipzig during summer for 6 weeks. Joining and getting to know the team was a great experience.

What did you gain from the project?

Throughout the project I practiced coding, working in a team, … I learned more about DBpedia, RML, Linked Data and other related technologies. I’m glad I had the opportunity to learn this much from the project. I would recommend it to all students who are curious about DBpedia, who are eager to learn and who want to earn a stipend during summer through coding. You’ll learn a lot and you’ll have a good time!

Final words to future GSoC applicants for DBpedia projects.

Give it a shot! Really, it’s a lot of fun! Coding for DBpedia through GSoC is a great, unique experience and one who is enthusiastic about coding and the DBpedia project should definitely go for it.

 

So, follow us on Twitter, Facebook and Subscribe to our Newsletter to never miss any information about GSoC 2018 projects or internship opportunities.

 

Yours

 

DBpedia Association

 

Amsterdam verwacht u op de volgende DBpedia Community Meeting!

Thursday, August 17, 2017 - 4:02pm

We are happy to announce that the 10th DBpedia Community Meeting will be held in Amsterdam, Netherlands. During the SEMANTiCS 2017, Sep 11-14, the DBpedia Community will get together on the 14th of September for the DBpedia Day.

What cool things do you do with DBpedia? Present your tools and datasets at the DBpedia Community Meeting. Please submit your proposal in our form.

Highlights

– Keynote by Chris Welty (Google Research NY)

– Keynote by Victor de Boer (VU University)

– DBpedia Association hour & Dutch DBpedia hour

– A session on DBpedia ontology by members of the DBpedia ontology committee

– DBpedia Tutorial Session (For people who want to learn about DBpedia.)

– We will talk with Mike Tung, CEO and founder from diffbot, about the DBpedia NLP department via videostream.

Quick Facts

– Web URL: http://wiki.dbpedia.org/meetings/Amsterdam2017

– Hashtag: #DBpediaAmsterdam17

– When: September 14th, 2017

– Where: Meervaart Theatre, Meer en Vaart 300, 1068 LE Amsterdam, Netherlands

– Call for Contribution: Please submit your proposal in our form.

Tickets

– Attending the DBpedia Community Meeting costs €40 (excl. registration fee and VAT). DBpedia members get free admission, please contact your nearest DBpedia chapter or the DBpedia Association for a promotion code.

– Please check all details here:  http://wiki.dbpedia.org/meetings/Amsterdam2017#tickets

Schedule

Please check our schedule for the 10th DBpedia Community meeting here: http://wiki.dbpedia.org/meetings/Amsterdam2017

Workshop

If you can’t stand it till the end of the SEMANTiCS, you can already participate in the workshop “Two worlds, one goal: A Reliable Linked Data ecosystem for media” held by DBpedia and Wolters Kluwer on the 11th of September. This half-day workshop aims at exploring major topics for publishers and libraries from DBpedia’s and Wolters Kluwer’s perspective. Therefore, both communities will dive into core areas like Interlinking, Metadata and Data Quality and address challenges such as fundamental requirements when publishing data on the web. Did we spark your interest? Check our detailed program here and get your ticket today.

Sponsors and Acknowledgments

– Vrije Universiteit Amsterdam (https://www.vu.nl/en/)

– ALIGNED Project (http://aligned-project.eu/)

– Institute for Applied Informatics (http://infai.org/en/AboutInfAI)

– OpenLink Software (http://www.openlinksw.com/)

– SEMANTICS Conference Sep 11-14, 2017 in Amsterdam (https://2017.semantics.cc/)

In case you want to sponsor the 10th DBpedia Community Meeting, please contact the DBpedia Association via dbpedia@infai.org.

Organisation

– Enno Meijers, National Library of the Netherlands & Dutch DBpedia

– Lieke Verhelst, Linked Data Factory, Informagic & Dutch DBpedia

– Victor de Boer, Vrije Universiteit Amsterdam & Dutch DBpedia

– Roland Cornelissen, metamatter & Dutch DBpedia

– Gerald Wildenbeest, Saxion & Dutch DBpedia

– Gerard Kuys, Ordina & Dutch DBpedia

– Maarten Brinkerink, The Netherlands Institute for Sound and Vision & Dutch DBpedia

– Julia Holze, DBpedia Association

– Dimitris Kontokostas, DBpedia Chapter Coordinator

– Sebastian Hellmann, AKSW/KILT, DBpedia Association

 

We are looking forward to meeting you in Amsterdam!

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Your DBpedia Association

Failte, Éirinn go Brách

Monday, July 17, 2017 - 1:42pm

Thanks to LDK2017 for co-hosting the DBpedia Community Meeting

After our 2nd Community Meeting in the US, we delighted the Irish DBpedia Community with the 9th DBpedia Community Meeting, which was co-located with the Language, Data and Knowledge Conference 2017 in Galway at the premises of the NUI Galway.

First and foremost, we would like to thank John McCrae (Insight Centre for Data Analytics, NUI Galway) and the LDK Conference for co-hosting and support the event.

 

The focus of this Community Meeting was the Irish DBpedia and Linked Data Community in Ireland. Therefore we invited local data scientists as well as European DBpedia enthusiasts to discuss the state of Irish Linked Data.

The meeting started with two compelling keynotes by Brian Ó Raghallaigh, Dublin City University and Logainm.ie, and Sharon Flynn, NUI Galway and Wikimedia Ireland. Brian presented Logainm.ie, a data use case about placenames in Ireland with a special focus on linked Logainm and machine-readable data.

Brian Ó Raghallaigh

His insightful presentation was followed by Sharon Flynn talking about Wikimedia in Ireland and the challenges of “this monumental undertaking” with particular reference to the Wikimedia Community in Ireland.

Sharon Flynn

For more details on the content of the presentations, follow the links to the slides.

Brian’s and Sharon’s slides

 

 

Showcase Session

Eoin McCuirc

Eoin McCuirc started the DBpedia Showcase Session “MY sweet LOD”, an insightful presentation on Linked Open Data in Ireland from the perspective of a statistics office.

Shortly after, Ronald Stamper, Chairman of Measur Ltd. elaborated on semantic normal form, ontologies and the perils of paradigm change.

Ben de Meester

Ben De Meester, from Ghent University, presented the first DBpedia Showcase about Declarative Data Transformation for Linked Data Generation.

Followed by another showcase by Alan Meehan, presenting the SUMMR Interlink Validation tool which validates Interlinks from a source dataset to multiple targets.

Fred Durao

Closing the Showcase Session,  Frederico Araujo Durao, Insight Centre for Data Analytics – University College Cork (UCC), presented a demo of his linked data browser.

 

For further details of the presentations follow the links to the slides.

Parallel sessions

As a regular part of the DBpedia Community Meeting we have two parallel sessions in the afternoon where DBpedia newbies can learn about what DBpedia is and how to use the DBpedia data sets.

Markus Freudenberg giving a DBpedia Tutorial

 

Participants who wanted to learn DBpedia basics joined the DBpedia Tutorial Session by Markus Freudenberg (DBpedia Release Manager). The DBpedia Association Hour provided a platform for the community to discuss and give feedback.

 

 

Sebastian Hellman and Julia Holze @ the DBpedia Association Hour

Additionally, Sebastian Hellmann and Julia Holze, members of the DBpedia Association, updated the participants about the growing number of the DBpedia Association members, the formalized DBpedia language chapters, the established DBpedia Community Committee and they informed about technical developments such as the DBpedia API.

 

Ontology Engineering and Software Alignment in the ALIGNED Project

The afternoon session started with the DBpedia 2016-10 release update by Markus Freudenberg (DBpedia Release Manager). Following this, Kevin Chekov Feeney, (Trinity College Dublin) presented the software alignment in the ALIGNED project. He talked about “Generating correct-by-construction semantic datasets from unstructured, semi-structured and badly structured data sources”.

Kevin Feeney – ALIGNED

 

 

At this point, we also like to thank the ALIGNED project for the development of DBpedia as a project use case and for covering parts of the travel cost.

 

 

Session about Irish Linked Data Projects

Chaired by Rob Brennan and Bianca Pereira, the speakers in the last session presented new Irish Linked Data Projects, for example GeoHive, BIOOPENER and the TCD Open Linked Data Engagement Fund Project. The following panel session gave DBpedia and Linked Data enthusiasts a platform for exchange and discussion. Outcome of this session was the creation of a roadmap for the Irish Linked Data with all participants.     

Following, you find a list of all presentations of this session:

Closing this session John McCrae announced that the next edition of the Language, Data and Knowledge (LDK) Conference is scheduled for 2019 in Germany. We at the DBpedia Association are now looking forward to welcome the LDK Community in Leipzig!

Social Evening Event

The Community Meeting slowly came to an end with our social evening event, which was held at the PorterShed in Galway. The evening session revolved around the topic How to exploit data commercially? and featured two short impulse talks. Paul Buitelaar started the session by presenting “Kibi”, which is an Open Source platform for Data Intelligence based on the search engine Elasticsearch. Finally, Sebastian Hellmann talked about “Improving the Utility of DBpedia by co-designing a public and commercial DBpedia API” (slides).

Summing up, the 9th DBpedia Community Meeting brought together more than 45 DBpedia enthusiasts from Ireland and Europe who engaged in vital discussions about Linked Data, DBpedia use cases and services.

You can find feedback about the event on Twitter via  #DBpediaGalway17.

We would like to thank Bianca Pereira and Caoilfhionn Lane from Insight Centre for Data Analytics, NUI Galway, as well Rob Brennan from ADAPT Research Centre, Trinity College Dublin, for devoting their time to curating the program organizing the meeting.

Special thanks go to LDK 2017 for hosting the meeting.

Thanks Ireland and hello Amsterdam!

We are looking forward to the next DBpedia Community Meeting which will be held in Amsterdam, Netherlands. Co-located with the SEMANTiCS17, the Community will get together on the 14th of September on the DBpedia Day.

 

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Your DBpedia Association

Results of the DBpedia Strategy Survey 2017

Tuesday, July 11, 2017 - 11:46am

Sören Auer and the DBpedia Board members prepared a survey to assess the direction of the DBpedia Association. We wanted to know what the DBpedia Community thinks about DBpedia’s strategic priorities and how the funds of the DBpedia Association are be spent. Between February 2017 and April 2017, a total of 40 members of the DBpedia Community actively participated in the survey and voted as follows:

1. What should be the priorities of the DBpedia Association in the next year?

To overview the various priorities which were mentioned, the following digest illustrates the answers in four different groups. The most frequent answer was: to increase the data quality, followed by the enlargement of the DBpedia Community through broader dissemination.

2. What should be the priorities of the DBpedia Association in the next three years?

In contrast to question one, this one is based on the priorities the DBpedia Association focuses on during the next three years. As well as in the previous overview, the specified priorities are divided into four categories.

3. What is your main interest in DBpedia?

The chart above depicts the several main interests in DBpedia. The majority of participants have an “academic & professional” (45.7%) interest in DBpedia, followed by “professional” (28.6%) and “academic” (20.0%) interests. Only 2.9% of the answers are student-related interests.

4. How should the funds of the association be used?

With respects to “How should the funds of the association be used?”, most attendees chose “service provisioning”. The “development of new DBpedia features” was the second most popular choice. Nevertheless, also “Community building” and “release production” scored many votes.

5. How should the DBpedia Association collaborate with national/language chapters?

  • Agreeing on strategic goals; making sure that national contributions can be spread to other chapters, thus increasing the overall usability of DBpedia; keeping track of good practices
  • Facilitating grassroots initiatives – so mainly promote and stimulate national/language initiatives
  • Local events related to DBpedia tasks
  • Regular events to share ideas and data
  • Join other languages members onto DBpedia
  • As an umbrella organization: support, mediation, and representation
  • Regular exchange and involvement
  • Consult, try to figure out common priorities

6. Should DBpedia open itself to contain and curate more data not directly extracted from Wikipedia?As the chart above clearly depicts, more than half of the participants are in favor of DBpedia comprising datasets not directly derived or extracted from Wikipedia. In contrast, 34.3% have the oppositional opinion and appreciate  DBpedia focussing solely on data extraction from Wikipedia.

  • If yes, which other datasets should DBpedia prioritize for fusion to improve its coverage and quality?

7. Which of the following features do you consider most important?

The following diagram gives a review of particular features and their importance from the participants point of view. As the result of question one reveals, data quality is considered the most important issue by the survey participants (23.7%). The second most important features, with 17.2% each, are: the provision of datasets extracted from the Wikipedia article text, substantial collaboration/integration with WikiData and a provision of better search, respectively an exploration of user interfaces.

8. Any other question, feedback, opinion, ideas or suggestion you would like to send to the association.

  • KUTGW
  • Increased support of non-RDF publication formats is probably wise as an insurance policy that DBpedia will stay relevant.
  • In users mailing-list being more open-minded in an easy manner and always signalling provocative postings are welcome. And I fear it is a bit late for this survey, but better late than never, my greetings to all making some thoughts about this stuff.
  • DBpedia Spotlight should return Wikidata URIs by default, for stability
  • Use a richer ontology without contradictions, e.g. Book-Physical vs. Book-Conceptual Work

Thank you for your input and your participation! Your priorities and opinions are of vital importance for the success of DBpedia in the future. We will discuss the implementation of your answers during our next DBpedia Board Meetings in order to find a reasonable strategic direction of the DBpedia Association for the next years.

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Your

DBpedia Association

New DBpedia Release – 2016-10

Tuesday, July 4, 2017 - 1:53pm

We are happy to announce the new DBpedia Release.

This release is based on updated Wikipedia dumps dating from October 2016.

You can download the new DBpedia datasets in N3 / TURTLE serialisation from http://wiki.dbpedia.org/downloads-2016-10 or directly here http://downloads.dbpedia.org/2016-10/.

This release took us longer than expected. We had to deal with multiple issues and included new data. Most notable is the addition of the NIF annotation datasets for each language, recording the whole wiki text, its basic structure (sections, titles, paragraphs, etc.) and the included text links. We hope that researchers and developers, working on NLP-related tasks, will find this addition most rewarding. The DBpedia Open Text Extraction Challenge (next deadline Mon 17 July for SEMANTiCS 2017) was introduced to instigate new fact extraction based on these datasets.

We want to thank anyone who has contributed to this release, by adding mappings, new datasets, extractors or issue reports, helping us to increase coverage and correctness of the released data.  The European Commission and the ALIGNED H2020 project for funding and general support.

You want to read more about the  New Release? Click below for further  details.[expander_maker id=”1″ more=”Read more” less=”Read less”]

 Statistics

Altogether the DBpedia 2016-10 release consists of 13 billion (2016-04: 11.5 billion) pieces of information (RDF triples) out of which 1.7 billion (2016-04: 1.6 billion) were extracted from the English edition of Wikipedia, 6.6 billion (2016-04: 6 billion) were extracted from other language editions and 4.8 billion (2016-04: 4 billion) from Wikipedia Commons and Wikidata.

In addition, adding the large NIF datasets for each language edition (see details below) increased the number of triples further by over 9 billion, bringing the overall count up to 23 billion triples.

Changes

  • The NLP Interchange Format (NIF) aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. To extend the versatility of DBpedia, furthering many NLP-related tasks, we decided to extract the complete human- readable text of any Wikipedia page (‘nif_context’), annotated with NIF tags. For this first iteration, we restricted the extent of the annotations to the structural text elements directly inferable by the HTML (‘nif_page_structure’). In addition, all contained text links are recorded in a dedicated dataset (‘nif_text_links’).
    The DBpedia Association started the Open Extraction Challenge on the basis of these datasets. We aim to spur knowledge extraction from Wikipedia article texts in order to dramatically broaden and deepen the amount of structured DBpedia/Wikipedia data and provide a platform for benchmarking various extraction tools with this effort.
    If you want to participate with your own NLP extraction engine, the next deadline for the SEMANTICS 2017 is July 17th.
    We included an example of these structures in section five of the download-page of this release.
  • A considerable amount of work has been done to streamline the extraction process of DBpedia, converting many of the extraction tasks into an ETL setting (using SPARK). We are working in concert with the Semantic Web Company to further enhance these results by introducing a workflow management environment to increase the frequency of our releases.

In case you missed it, what we changed in the previous release (2016-04)

  • We added a new extractor for citation data that provides two files:
    • citation links: linking resources to citations
    • citation data: trying to get additional data from citations. This is a quite interesting dataset but we need help to clean it up
  • In addition to normalised datasets to English DBpedia (en-uris), we additionally provide normalised datasets based on the DBpedia Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for the upcoming fusion process with wikidata. The DBw-based uris will be the only ones provided from the following releases on.
  • We now filter out triples from the Raw Infobox Extractor that are already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x> dbp:birthPlace|dbp:placeOfBirth|… <z>” in the same resource. These triples are now moved to the “infobox-properties-mapped” datasets and not loaded on the main endpoint. See issue 22 for more details.
  • Major improvements in our citation extraction. See here for more details.
  • We incorporated the statistical distribution approach of Heiko Paulheim in creating type statements automatically and providing them as additional datasets (instance_types_sdtyped_dbo).

 

Upcoming Changes

  • DBpedia Fusion: We finally started working again on fusing DBpedia language editions. Johannes Frey is taking the lead in this project. The next release will feature intermediate results.
  • Id Management: Closely pertaining to the DBpedia Fusion project is our effort to introduce our own Id/IRI management, to become independent of Wikimedia created IRIs. This will not entail changing out domain or entity naming regime, but providing the possibility of adding entities of any source or scope.
  • RML Integration: Wouter Maroy did already provide the necessary groundwork for switching the mappings wiki to an RML based approach on Github. Wouter started working exclusively on implementing the Git based wiki and the conversion of existing mappings last week. We are looking forward to the consequent results of this process.
  • Further development of SPARK Integration and workflow-based DBpedia extraction, to increase the release frequency.

 

New Datasets

  • New languages extracted from Wikipedia:

South Azerbaijani (azb), Upper Sorbian (hsb), Limburgan (li), Minangkabau (min), Western Mari (mrj), Oriya (or), Ossetian (os)

  • SDTypes: We extended the coverage of the automatically created type statements (instance_types_sdtyped_dbo) to English, German and Dutch.
  • Extensions: In the extension folder (2016-10/ext) we provide two new datasets (both are to be considered in an experimental state:
    • DBpedia World Facts: This dataset is authored by the DBpedia Association itself. It lists all countries, all currencies in use and (most) languages spoken in the world as well as how these concepts relate to each other (spoken in, primary language etc.) and useful properties like iso codes (ontology diagram). This Dataset extends the very useful LEXVO dataset with facts from DBpedia and the CIA Factbook. Please report any error or suggestions in regard to this dataset to Markus.
    • JRC-Alternative-Names: This resource is a link based complementary repository of spelling variants for person and organisation names. The data is multilingual and contains up to hundreds of variations entity. It was extracted from the analysis of news reports by the Europe Media Monitor (EMM) as available on JRC-Names.

 Community

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:

  • 760 classes
  • 1,105 object properties
  • 1,622 datatype properties
  • 132 specialised datatype properties
  • 414 owl:equivalentClass and 220 owl:equivalentProperty mappings external vocabularies

The editor community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2016-10 extraction, we used a total of 5887 template mappings (DBpedia 2015-10: 5800 mappings). The top language, gauged by the number of mappings, is Dutch (648 mappings), followed by the English community (606 mappings).[/expander_maker]

 Credits to

  • Markus Freudenberg (University of Leipzig / DBpedia Association) for taking over the whole release process and creating the revamped download & statistics pages.
  • Dimitris Kontokostas (University of Leipzig / DBpedia Association) for conveying his considerable knowledge of the extraction and release process.
  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
  • Václav Zeman and the whole LHD team (University of Prague) for their contribution of additional DBpedia types
  • Alan Meehan (TCD) for performing a big external link cleanup
  • Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
  • SpringerNature for offering a co-internship to a bright student and developing a closer relation to DBpedia on multiple issues, as well as Links to their SciGraph subjects.
  • Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services.
  • OpenLink Software (http://www.openlinksw.com/) collectively for providing the SPARQL Query Services and Linked Open Data publishing infrastructure for DBpedia in addition to their continuous infrastructure support.
  • Ruben Verborgh from Ghent University – imec for publishing the dataset as Triple Pattern Fragments, and imec for sponsoring DBpedia’s Triple Pattern Fragments server.
  • Ali Ismayilov (University of Bonn) for extending and cleaning of the DBpedia Wikidata dataset.
  • All the GSoC students and mentors which have directly or indirectly worked on the DBpedia release
  • Special thanks to members of the DBpedia Association, the AKSW and the Department for Business Information Systems of the University of Leipzig.

The work on the DBpedia 2016-10 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering.

More information about DBpedia is found at http://dbpedia.org as well as in the new overview article about the project available at http://wiki.dbpedia.org/Publications.

Have fun with the new DBpedia 2016-10 release!

Galway is calling for the next DBpedia Community Meeting.

Wednesday, June 7, 2017 - 1:57pm

We are happy to announce that the 9th DBpedia Community meeting will be held in Galway, Ireland on June 21st 2017. DBpedia will be part of the Language, Data and Knowledge conference (LDK) in Galway. This new biennial conference series aims at bringing together researchers from across disciplines. The DBpedia Meeting is part of the conference and is scheduled for the last day.

Only few seats are left: So come and get your ticket to be part of the 9th DBpedia Community meeting in Galway.

Highlights

  • Keynote #1: Logainm.ie data use cases by Brian Ó Raghallaigh (Dublin City University & Logainm)
  • Keynote #2: Wikimedia in Ireland: A Monumental Undertaking by Sharon Flynn (NUI Galway & Wikimedia Ireland)
  • DBpedia Association hour
  • A session about Irish Linked data projects (and DBpedia)

Quick Facts

Schedule

Please check our schedule for the 9th DBpedia Community meeting here: http://wiki.dbpedia.org/meetings/Galway2017

Evening Event

The social event will be held in the evening (starting at 6pm) at the PorterShed around the topic How to exploit data commercially? featuring several short impulse talks. We still have some remaining slots and would welcome you to present your success stories as well as use cases, but also tell us about your problems regarding the commercialisation of data. If you are interested in presenting, please email dbpedia@infai.org.

Sponsors and Acknowledgments

LDK2017 For hosting the meeting.
Institute for Applied Informatics For supporting the DBpedia Association.
OpenLink Software For continuous hosting of the main DBpedia Endpoint.
ADAPT research centre For supporting the DBpedia Association.
ALIGNED – Software and Data Engineering For funding the development of DBpedia as a project use-case and covering part of the travel cost.
PorterShed For hosting the evening event.

In case you want to sponsor the 9th DBpedia Community Meeting, please contact the DBpedia Association via dbpedia@infai.org.

Organisation

  • Tatiana Gornostay, TILDE
  • Rob Brennan, ADAPT research centre
  • Felix Sasaki, DFKI GmbH
  • Bianca Pereira, The Insight Centre for Data Analytics
  • Caoilfhionn Lane, The Insight Centre for Data Analytics
  • Jimmy O’Regan, ITUT, Trinity College Dublin
  • Julia Holze, DBpedia Association
  • Sandra Prätor, DBpedia Association
  • Sebastian Hellmann, DBpedia Association and AKSW, Uni Leipzig

We are looking forward to meeting you in Galway!

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Your DBpedia Association

Smart Minds Wanted

Thursday, June 1, 2017 - 1:31pm

New Internship Opportunity @

In conjunction with Springer Nature,  DBpedia offers a 3 months internship at Springer Nature in London, UK and at DBpedia in Leipzig, Germany.

Internship Details

Position DBpedia Intern
Main Employer DBpedia Association
Deadline June 30th, 2017
Duration 3 months/full-time, internship will starts in the second half of 2017
Location 50% in London (UK) and 50% in Leipzig (GER)
Type of students desired Undergraduate, Graduate (Junior role)
Compensation You will receive a stipend of 1300€ per month and additional reimbursement of your travel and visa costs (total up to 1000€)

The student intern will be responsible for assisting with mappings for DBpedia at SpringerNature. Your tasks include and are not restricted to improving the quality of the extraction mechanism of DBpedia scholarly references/wikipedia citations to Springer Nature URIs and Text mining of DBpedia entities from Springer Nature publication content.

Did we spark your interest? Check  our website for further information or apply directly via our online application form

We are looking forward to meet all the whiz kids out there.

Your

DBpedia Association

GSoC 2017- may the code be with you

Friday, May 5, 2017 - 10:16am

GSoC students have finally been selected.

We are very excited to announce this year’s final students for our projects  at the Google Summer of Code program (GSoC).

Google Summer of Code is a global program focused on bringing more student developers into open source software development. Stipends are awarded to students to work on a specific DBpedia related project together with a set of dedicated mentors during summer 2017 for the duration of three months.

For the past 5 years DBpedia has been a vital part of the GSoC program. Since the very first time many Dbpedia projects have been successfully completed.

In this years GSoC edition, DBpedia received more than 20 submissions for selected DBpedia projects. Our mentors read many promising proposals, evaluated them and now the crême de la crême of students snatched a spot for this summer.  In the end 7 students from around the world were selected and will jointly work together with their assigned mentors on their projects. DBpedia developers and mentors are really excited about this 7 promising student projects.

List of students and projects:

You want to read more about their specific projects? Just click below… or check GSoC pages for details.[expander_maker id=”1″ more=”Read more” less=”Read less”] Ismael Rodriguez – Project Description: Although the DBPedia Extraction Framework was adapted to support RML mappings thanks to a project of last year GSoC, the user interface to create mappings is still done by a MediaWiki installation, not supporting RML mappings and needing expertise on Semantic Web. The goal of the project is to create a front-end application that provides a user-friendly interface so the DBPedia community can easily view, create and administrate DBPedia mapping rules using RML. Moreover, it should also facilitate data transformations and overall DBPedia dataset generation. Mentors: Anastasia Dimou, Dimitris Kontokostas, Wouter Maroy 

Ram Ganesan Athreya – Project Description:The requirement of the project is to build a conversational Chatbot for DBpedia which would be deployed in at least two social networks.There are three main challenges in this task. First is understanding the query presented by the user, second is fetching relevant information based on the query through DBpedia and finally tailoring the responses based on the standards of each platform and developing subsequent user interactions with the Chatbot.Based on my understanding, the process of understanding the query would be undertaken by one of the mentioned QA Systems (HAWK, QANARY, openQA). Based on the response from these systems we need to query the DBpedia dataset using SPARQL and present the data back to the user in a meaningful way. Ideally, both the presentation and interaction flow needs to be tailored for the individual social network.I would like to stress that although the primary medium of interaction is text, platforms such as Facebook insist that a proper mix between chat and interactive elements such as images, buttons etc would lead to better user engagement. So I would like to incorporate these elements as part of my proposal.

Mentor: Ricardo Usbeck

 

Nausheen Fatma – Project discription:  Knowledge base embeddings has been an active area of research. In recent years a lot of research work such as TransE, TransR, RESCAL, SSP, etc. has been done to get knowledge base embeddings. However none of these approaches have used DBpedia to validate their approach. In this project, I want to achieve the following tasks: i) Run the existing techniques for KB embeddings for standard datasets. ii) Create an equivalent standard dataset from DBpedia for evaluations. iii) Evaluate across domains. iv) Compare and Analyse the performance and consistency of various approaches for DBpedia dataset along with other standard datasets. v)Report any challenges that may come across implementing the approaches for DBpedia. Along the way, I would also try my best to come up with any new research approach for the problem.

Mentors: Sandro Athaide Coelho, Tommaso Soru

 

Akshay Jagatap – Project Description: The project aims at defining embeddings to represent classes, instances and properties. Such a model tries to quantify semantic similarity as a measure of distance in the vector space of the embeddings. I believe this can be done by implementing Random Vector Accumulators with additional features in order to better encode the semantic information held by the Wikipedia corpus and DBpedia graphs.

Mentors: Pablo Mendes, Sandro Athaide Coelho, Tommaso Soru

 

Luca Virgili –  Project Description: In Wikipedia a lot of data are hidden in tables. What we want to do is to read correctly all tables in a page. First of all, we need a tool that can allow us to capture the tables represented in a Wikipedia page. After that, we have to understand what we read previously. Both these operations seem easy to make, but there are many problems that could arise. The main issue that we have to solve is due to how people build table. Everyone has a particular style for representing information, so in some table we can read something that doesn’t appear in another structure. In this paper I propose to improve the last year’s project and to create a general way for reading data from Wikipedia tables. I want to review the parser for Wikipedia pages for trying to understand more types of tables possible. Furthermore, I’d like to build an algorithm that can compare the column’s elements (that have been read previously by the parser) to an ontology so it could realize how the user wrote the information. In this way we can define only few mapping rules, and we can make a more generalized software.

Mentors: Emanuele Storti, Domenico Potena

 

Shashank Motepalli – Project Description: DBpedia tries to extract structured information from Wikipedia and make information available on the Web. In this way, the DBpedia project develops a gigantic source of knowledge. However, the current system for building DBpedia Ontology relies on Infobox extraction. Infoboxes, being human curated, limit the coverage of DBpedia. This occurs either due to lack of Infoboxes in some pages or over-specific or very general taxonomies. These factors have motivated the need for DBTax.DBTax follows an unsupervised approach to learning taxonomy from the Wikipedia category system. It applies several inter-disciplinary NLP techniques to assign types to DBpedia entities. The primary goal of the project is to streamline and improve the approach which was proposed. As a result, making it easy to run on a new DBpedia release. In addition to this, also to work on learning taxonomy of DBTax to other Wikipedia languages.

Mentors: Marco Fossati, Dimitris Kontokostas

 

Krishanu Konar – Project Description: Wikipedia, being the world’s largest encyclopedia, has humongous amount of information present in form of text. While key facts and figures are encapsulated in the resource’s infobox, and some detailed statistics are present in the form of tables, but there’s also a lot of data present in form of lists which are quite unstructured and hence its difficult to form into a semantic relationship. The project focuses on the extraction of relevant but hidden data which lies inside lists in Wikipedia pages. The main objective of the project would be to create a tool that can extract information from wikipedia lists, form appropriate RDF triplets that can be inserted in the DBpedia dataset.

Mentor: Marco Fossati [/expander_maker]

Congrats to all selected students! We will keep our fingers crossed now and patiently wait until early September, when final project results are published.

An encouraging note to the less successful students.

The competition for GSoC slots is always on a very high level and DBpedia only has a limited amount of slots available for students.  In case you weren’t among the selected, do not give up on DBpedia just yet. There are plenty of opportunities to prove your abilities and be part of the DBpedia experience. You, above all, know DBpedia by heart. Hence, contributing to our support system is not only a great way to be part of the DBpedia community but also an opportunity to be vital to DBpedia’s development. Above all, it is a chance for current DBpedia mentors to get to know you better. It will give your future mentors a chance to  support you and help you to develop your ideas from the very beginning.

Go on you smart brains, dare to become a top DBpedia expert and provide good support for other DBpedia Users. Sign up to our support page  or check out the following ways to contribute:

Get involved:
  • Join our DBpedia-discussion -mailinglist, where we discuss current DBpedia developments. NOTE: all mails announcing tools or call to papers unrelated to DBpedia are not allowed. This is a community discussion list.
  • If you like to join DBpedia developers discussion and technical discussions sign up in Slack
  • Developer Discussion
  • Become a DBpedia Student and sign up for free at the DBpedia Association. We offer special programs that provide training and other opportunities to learn about DBpedia and extend your Semantic Web and programming skills

We are looking forward to working with you!

You don’t have enough of DBpedia yet? Stay tuned and join us on facebook, twitter or subscribe to our newsletter for the latest news!

 

Have a great weekend!

Your

DBpedia Association

Pages