Aggregator Specifications

The technical specifications of the news aggregator project

Specifications for News Aggregator project outlined at the Network X gathering

Purpose of project

To create a website that pulls in news feeds from multiple sources and aggregates them into a single source of reference for the user.

Essential specifications

The essential specifications for this project laid out at the Network X gathering held in Manchester on the weekend of 15/16th January 2011 are as follows:

  • Website will initially be English language
  • Website will Focused on anti-cuts content(for the launch), but with the plan to
  • expand in the future
  • The website will display user-filtered content: there is to be no editorial control, (however some degree of
  • moderation will need to be practised in an unobtrusive way)
  • The website is to be launched as soon as possible

Division of contributions to the project

The main areas of contribution required for development of this project are as follows

  • Tech and programming
  • Source finding for news resources
  • Web design
  • Promotion and publicity of the site, copy when it needs to be written
  • Feedback from people on matters of accessibility and aesthetics
  • Ongoing assistance to put tags on articles and content

Technical specifications for project

Design principles – essential

  • Ability to aggregate RSS/atom feeds from a multitude of sources into a single unified output
  • Ability for the user to filter that output into a personalised news feed
  • Must be simple to use
  • Must be initially deployable quickly with further enhancement and development continuing concurrently


Stage 1: Simple system produced that can aggregate feeds and display them to the user based on their preferences. System then handed to designers and usability group for testing.

Stage 2: Changes made based on feedback from these groups

Stage 3: Alpha version released and publicised

Stage 4: Gain feedback from wider user base of alpha release and incorporate recommended changes

Stage 5: Beta version release and wider publicity. Forum or other tool linked to project to ensure ongoing feedback possible

Stage 6: Full release schedule to be mapped. Enhancements to Indexing engine released as available

Technical components

The finished aggregator will be made up of several components described here. The components do not necessarily need to be part of a single CMS eg Drupal and could be developed seperately provided they are able to communicate with each other in a standards compliant way. The project as a whole could be developed as a single tech effort, or broken down into smaller components with distinct teams of geeks working on each.

Aggregation Engine Spec

The aggregation engine can be either a standalone component or part of a CMS and must be able to do the following

  • Retrieve feeds from other sources in RSS/Atom format
  • Retrieve tags and other metadata along with the feeds
  • Aggregate those feeds into a single output, preserving metadata and other XML and pass them to the Indexing Engine

Indexing Engine spec

The indexing engine will order and sort incoming data into the (most likely MySQL) database. It will handle linking additional metadata from external sources to articles that have been handed to it by the aggregation engine. It must be able to do the following:

  • Retrieve aggregated list of content from the aggregation engine
  • Directly alter the underlying database
  • Be able to link additional metadata to an article
  • Be able to construct database queries
  • Be able to index and sort data from a multitude of sources
  • Be able to order results of a search query by relevance
  • Return a list of results from the database that are relevantly linked to each other


FISE is an open source framework for semantic enhancement engines. Running FISE you can send English documents as plain text to the FISE server via a RESTful web interface and get back semantic annotations for this content. The annotations are computed using different “enhancement engines” which can be plugged into FISE. Depending on the active enhancement engines FISE will annotate e.g. people and places. The possibility exists for using FISE to enhance the metadata that is included by both the originating feed and user-generated tagging and provide relevant, linked output.

Search query handler

The search query handler will be responsible for constructing database queries from a number of sources including

  • Direct user searches (a search box)
  • User defined preferences
  • knowledge of user preferences gained from previous sessions

User interface

The user interface design will be led by user feedback. The core requirements for the user interface are that it is as simple as possible to use and understand.


Trying to collate a usable specification for the project that can focus the development. Feel free to pitch in.

In particular, it would be really useful for those working on sourcing/design/usability to add to the document too


Hey you have done good work putting this all together JimDog. How is progress looking?


I’ve got a working aggregation system at based on drupal and currently have it in maintenance mode whilst I strip out unecessary modules and get to grips with it. It is essentially a stripped down version of and instead of the publishing options I am about to install the drupal module flag for peer recommendations. The taxonomy for searching needs a little work, anyone who knows drupal at all and wants a login to get involved at this stage let me know :)

Also installed a semantic tagging engine and fine tuning that too



I now have a working alpha version and have added a bunch of feeds to it already. The login isn’t quite there yet, will have that working in the next few days, but you can see the site working at

can people have a look and give me some feedback perhaps.

User tagging is working (when i get the login sorted out that is). Saving custom searches based on tags works, and I am working on a “starring mechanism” to add things to user generated channels. It will all make more sense when the login works I hope :-)


HEY jimdog sorry for lack of replies – i will have a look at this over the weekend and aim to give feedback next week

good work – it would be great to have it fully operational for the TUC march 26 demo


It’s all working, just needs a URL and a server to host it on properly (though I think it is ok where it is for the short term at least