org.varienaja.util.wikipedia
Class WikipediaSearcher

java.lang.Object
  extended by org.varienaja.util.wikipedia.WikipediaSearcher

public class WikipediaSearcher
extends java.lang.Object

Class that provides functions to search in Wikipedia.

Author:
Varienaja

Constructor Summary
WikipediaSearcher()
           
 
Method Summary
protected static java.lang.String[] extractGenrePart(java.lang.String content)
          Searches for a substring like "Genre =[[Rhythm and blues|R&B]], [[Funk]], [[Rock music|Rock]]" in the content.
protected static java.lang.String[] extractGenres(java.lang.String genrespart)
          Processes a String like "[[Rhythm and blues|R&B]], [[Funk]], [[Rock music|Rock]]" and creates a String Array, containing only the text.
static java.lang.String getBandInfo(java.lang.String bandname)
          Retrieves Band info from Wikipedia.
protected static java.lang.String getBandURL(java.lang.String bandname)
          Searches Wikipedia for the URL of the given band.
protected static java.lang.String getContentFromDocument(org.w3c.dom.Document doc)
          Returns the content-element of an XMLDocument.
protected static java.lang.String getContentURL(java.lang.String query)
          Constructs a URL for getting contents from Wikipedia
protected static org.w3c.dom.Document getDocumentFromInputStream(java.io.InputStream in)
          Returns the Document object which was created from the xml in the inputstream.
static java.lang.String getExternalBandURL(java.lang.String bandname)
          Get the Wikipedia URL for a Band.
static java.lang.String[] getKeywords(java.lang.String bandname, java.lang.String playlistname)
          Searches Wikipedia for Genre(s) of a specific playlist.
protected static java.util.List<java.lang.String> getLinks(java.io.InputStream in)
          Returns all links that are present in a certain InputStream.
protected static java.util.List<java.lang.String> getLinksFromDocument(org.w3c.dom.Document doc)
          Returns all links that are present in a Document.
protected static java.lang.String getSearchURL(java.lang.String query)
          Constructs a URL for searching Wikipedia
protected static org.w3c.dom.Document queryWikipedia(java.lang.String location)
          Searches Wikipedia, and returns the resulting page as a completely parsed XMLDocument
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WikipediaSearcher

public WikipediaSearcher()
Method Detail

getKeywords

public static java.lang.String[] getKeywords(java.lang.String bandname,
                                             java.lang.String playlistname)

Searches Wikipedia for Genre(s) of a specific playlist. Results of this method are cached, so you can call this method safely and performantly for all songs of a playlist.

Parameters:
bandname - The name of the Band
playlistname - The name of the Playlist
Returns:
An array of Strings, describing the Genre(s) of the Playlist

getBandURL

protected static java.lang.String getBandURL(java.lang.String bandname)
                                      throws WikipediaException
Searches Wikipedia for the URL of the given band. TODO Refactor repetitive code.

Parameters:
bandname - The Band to search for.
Returns:
The String that can be appended to the stardard-Wikipedia URL to open the page for the Band. The result is null, of no band was found.
Throws:
WikipediaException - When searching went wrong somehow.

getDocumentFromInputStream

protected static org.w3c.dom.Document getDocumentFromInputStream(java.io.InputStream in)
Returns the Document object which was created from the xml in the inputstream.

Parameters:
in - The InputStream containing XML-data
Returns:
The w3c.dom.Document

getLinks

protected static java.util.List<java.lang.String> getLinks(java.io.InputStream in)
Returns all links that are present in a certain InputStream. You should only call this method with InputStream-parameters with xml-data in them that comes from Wikipedia.org

Parameters:
in - The InputStream containing XML-data
Returns:
All (wikipedia)links that are found in the InputStream, or null if an error occured.

getLinksFromDocument

protected static java.util.List<java.lang.String> getLinksFromDocument(org.w3c.dom.Document doc)
Returns all links that are present in a Document. You should only call this method with Document-parameters with data in them that comes from Wikipedia.org TODO Use XPath?

Parameters:
doc - The document
Returns:
All (wikipedia)links that are found in the Document

getContentFromDocument

protected static java.lang.String getContentFromDocument(org.w3c.dom.Document doc)
Returns the content-element of an XMLDocument.

Parameters:
doc - The Document
Returns:
Everything between <content> and </content>

getSearchURL

protected static java.lang.String getSearchURL(java.lang.String query)
Constructs a URL for searching Wikipedia

Parameters:
query - The query
Returns:
The URL

getContentURL

protected static java.lang.String getContentURL(java.lang.String query)
Constructs a URL for getting contents from Wikipedia

Parameters:
query - The query
Returns:
The URL

queryWikipedia

protected static org.w3c.dom.Document queryWikipedia(java.lang.String location)
Searches Wikipedia, and returns the resulting page as a completely parsed XMLDocument

Parameters:
location - The URL
Returns:
The Document, containing the links of the page.

extractGenrePart

protected static java.lang.String[] extractGenrePart(java.lang.String content)
Searches for a substring like "Genre =[[Rhythm and blues|R&B]], [[Funk]], [[Rock music|Rock]]" in the content.

Parameters:
content - The String to search in
Returns:
A String Array, containing the Keywords found.

extractGenres

protected static java.lang.String[] extractGenres(java.lang.String genrespart)

Processes a String like "[[Rhythm and blues|R&B]], [[Funk]], [[Rock music|Rock]]" and creates a String Array, containing only the text.

[[X]] becomes X
[[X|Y]] becomes Y

TODO Create unittests for this method, and use a different method for the splitting

Parameters:
genrespart - The inputstring
Returns:
An Array of String containing the Keywords found in the inputstring.

getBandInfo

public static java.lang.String getBandInfo(java.lang.String bandname)
                                    throws WikipediaException
Retrieves Band info from Wikipedia. Bandinfo is just the general contents of the Wikipedia-entry.

Parameters:
bandname -
Returns:
The first paragraph of the Wikipedia entry for the given Band.
Throws:
WikipediaException - When no info was found or the contents returned could not be deciphered.

getExternalBandURL

public static java.lang.String getExternalBandURL(java.lang.String bandname)
Get the Wikipedia URL for a Band.

Parameters:
bandname - The name of the band.
Returns:
The (external) URL, linking to the Wikipedia-entry for the Band. The result is null if no URL could be found.


Copyright © 2010 A.J.V.. All Rights Reserved.