7 december 2009

Web search and ‘everyware’ targeted by Oxford scientists

Science | Technology

blog typing
Oxford research could lead to a new kind of search engine

Two new projects launched by Oxford University computer scientists aim to tackle some of the thorniest problems in technology: extracting better information from the Internet and making many tiny computing devices work together.

DIADEM, led by Professor Georg Gottlob, and VERIWARE, led by Professor Marta Kwiatkowska, were recently awarded Advanced Investigators Grants from The European Research Council worth a total of €4.4m.

DIADEM (Domain-centric Intelligent Automated Data Extraction Methodology) sets out to solve the problem of extracting complex, structured information from large numbers of websites.

‘If we succeed, DIADEM will be the next major step forward in web search technology,’ said Professor Georg Gottlob of Oxford University’s Computing Laboratory. ‘It will boost individual and corporate web users’ ability to get the information they need from the Internet.’

If we succeed, DIADEM will be the next major step forward in web search technology,

Professor Georg Gottlob

Traditional web search engines rely on looking for keywords on web pages. They work well when looking for some kinds of information, but struggle with more complex queries – typing ‘restaurants near me serving pasta al pesto as today’s special’ isn’t likely to produce useful results.

With DIADEM, Professor Gottlob aims to create software that can trawl through every website in a particular field – the property market, for example, or restaurants, or air travel – and pull out the information they contain in a structured form. Equipped with a basic knowledge of the general principles its domain works on, it will be able to analyse each web page’s low-level structure as it goes in order to extract the information it contains.

Professor Gottlob said: ‘Humans find it easy to visit a new website and immediately grasp its structure and what the different elements on each page mean – which of the numbers visible is an item’s price, for example, or how to interpret a timetable. But computers struggle with this kind of semi-structured content – they don’t understand how websites are structured.’

By the end of the DIADEM project, Professor Gottlob hopes to have built a system that can deal with a specified country’s property market, analysing tens of thousands of estate agents’ websites and presenting the properties discovered to users. The result won’t simply be a web page with links to other pages that may contain relevant information, as with traditional search engines; it will be a structured dataset drawn from the data objects found on sites within the domain, which can easily be searched or further processed by other software applications.

Companies like Google, Microsoft and Yahoo! have already expressed interest in DIADEM’s results, which could lead to the next generation of search engines, going beyond the limitations of keyword searching.

VERIWARE aims to make the theoretical and practical breakthroughs that will let us be sure that our technology is functioning as it should as we enter a new era of ubiquitous computing, in which information processing moves out of desktop computers and into the everyday objects all around us.

This kind of computing without computers has been called ‘everyware’. The technology is still in its infancy, but examples are starting to appear – from Bluetooth mobile phones that automatically sense each other’s presence and exchange information to fridges that can tell when the food in them is running out and automatically order more from the Internet.

In the future we are likely to be surrounded by countless tiny computers monitoring their environment with electronic sensors and wirelessly sharing information with each other in order to, for example, give us access to healthcare and banking, or control the environment in our homes.

But as things stand there is no rigorous way to make sure these embedded systems work as they should. Already there have been high-profile cases where expensive products have had to be recalled due to unforeseen flaws in the software embedded in them. Some of these faults have merely been expensive for the manufacturer; others, such as problems with cars’ built-in computers controlling engine function or braking, could be fatal.

‘The pace of technological change is accelerating,’ said Professor Marta Kwiatkowska of Oxford University’s Computing Laboratory. ‘We need a paradigm shift in software verification to let us deal with challenges posed by complex communities of ‘everyware’.’ She specialises in what is known as ‘model checking’ – a technique that uses dedicated software to analyse a computer programme, model all the possible states it can enter and prove whether or not it will ever reach a specified situation, or how long it could take to reach it. The technology has been used to solve problems like determining the worst-case scenario for the time taken to transfer a given amount of information over a wireless network.

But model-checking a computer programme once before it is installed is a very different proposition to continuously and autonomously model-checking everyware that exists in a complex situation of continuous interaction with the other devices around it, in an uncertain environment and without human input.

VERIWARE aims to make the major theoretical breakthroughs needed to give us the confidence to depend on these devices that will soon be part of all our lives.