Specifications for News Aggregator project outlined at the Network X gathering¶
Purpose of project¶
To create a website that pulls in news feeds from multiple sources and aggregates them into a single source of reference for the user.
The essential specifications for this project laid out at the Network X gathering held in Manchester on the weekend of 15/16th January 2011 are as follows:
- Website will initially be English language
- Website will Focused on anti-cuts content(for the launch), but with the plan to
- expand in the future
- The website will display user-filtered content: there is to be no editorial control, (however some degree of
- moderation will need to be practised in an unobtrusive way)
- The website is to be launched as soon as possible
Division of contributions to the project¶
The main areas of contribution required for development of this project are as follows
- Tech and programming
- Source finding for news resources
- Web design
- Promotion and publicity of the site, copy when it needs to be written
- Feedback from people on matters of accessibility and aesthetics
- Ongoing assistance to put tags on articles and content
Technical specifications for project¶
Design principles – essential¶
- Ability to aggregate RSS/atom feeds from a multitude of sources into a single unified output
- Ability for the user to filter that output into a personalised news feed
- Must be simple to use
- Must be initially deployable quickly with further enhancement and development continuing concurrently
Stage 1: Simple system produced that can aggregate feeds and display them to the user based on their preferences. System then handed to designers and usability group for testing.
Stage 2: Changes made based on feedback from these groups
Stage 3: Alpha version released and publicised
Stage 4: Gain feedback from wider user base of alpha release and incorporate recommended changes
Stage 5: Beta version release and wider publicity. Forum or other tool linked to project to ensure ongoing feedback possible
Stage 6: Full release schedule to be mapped. Enhancements to Indexing engine released as available
The finished aggregator will be made up of several components described here. The components do not necessarily need to be part of a single CMS eg Drupal and could be developed seperately provided they are able to communicate with each other in a standards compliant way. The project as a whole could be developed as a single tech effort, or broken down into smaller components with distinct teams of geeks working on each.
Aggregation Engine Spec¶
The aggregation engine can be either a standalone component or part of a CMS and must be able to do the following
- Retrieve feeds from other sources in RSS/Atom format
- Retrieve tags and other metadata along with the feeds
- Aggregate those feeds into a single output, preserving metadata and other XML and pass them to the Indexing Engine
Indexing Engine spec¶
The indexing engine will order and sort incoming data into the (most likely MySQL) database. It will handle linking additional metadata from external sources to articles that have been handed to it by the aggregation engine. It must be able to do the following:
- Retrieve aggregated list of content from the aggregation engine
- Directly alter the underlying database
- Be able to link additional metadata to an article
- Be able to construct database queries
- Be able to index and sort data from a multitude of sources
- Be able to order results of a search query by relevance
- Return a list of results from the database that are relevantly linked to each other
FISE is an open source framework for semantic enhancement engines. Running FISE you can send English documents as plain text to the FISE server via a RESTful web interface and get back semantic annotations for this content. The annotations are computed using different “enhancement engines” which can be plugged into FISE. Depending on the active enhancement engines FISE will annotate e.g. people and places. The possibility exists for using FISE to enhance the metadata that is included by both the originating feed and user-generated tagging and provide relevant, linked output.
Search query handler¶
The search query handler will be responsible for constructing database queries from a number of sources including
- Direct user searches (a search box)
- User defined preferences
- knowledge of user preferences gained from previous sessions
The user interface design will be led by user feedback. The core requirements for the user interface are that it is as simple as possible to use and understand.
Trying to collate a usable specification for the project that can focus the development. Feel free to pitch in.
In particular, it would be really useful for those working on sourcing/design/usability to add to the document too
Hey you have done good work putting this all together JimDog. How is progress looking?
I’ve got a working aggregation system at ugli.aktivix.org based on drupal and currently have it in maintenance mode whilst I strip out unecessary modules and get to grips with it. It is essentially a stripped down version of bethemedia.org.uk and instead of the publishing options I am about to install the drupal module flag for peer recommendations. The taxonomy for searching needs a little work, anyone who knows drupal at all and wants a login to get involved at this stage let me know :)
Also installed a semantic tagging engine and fine tuning that too
I now have a working alpha version and have added a bunch of feeds to it already. The login isn’t quite there yet, will have that working in the next few days, but you can see the site working at ugli.aktivix.org
can people have a look and give me some feedback perhaps.
User tagging is working (when i get the login sorted out that is). Saving custom searches based on tags works, and I am working on a “starring mechanism” to add things to user generated channels. It will all make more sense when the login works I hope :-)
HEY jimdog sorry for lack of replies – i will have a look at this over the weekend and aim to give feedback next week
good work – it would be great to have it fully operational for the TUC march 26 demo
It’s all working, just needs a URL and a server to host it on properly (though I think it is ok where it is for the short term at least