Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/molekilla/osoframework
Automatically exported from code.google.com/p/osoframework
https://github.com/molekilla/osoframework
Last synced: about 1 month ago
JSON representation
Automatically exported from code.google.com/p/osoframework
- Host: GitHub
- URL: https://github.com/molekilla/osoframework
- Owner: molekilla
- Created: 2015-03-14T17:04:50.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2015-05-06T20:26:15.000Z (over 9 years ago)
- Last Synced: 2024-04-10T02:56:46.191Z (9 months ago)
- Language: C#
- Size: 2.11 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# osoframework
Web robot framework, works with .NET 3.5 and Mono 2.3.Example code:
```
// using read
string goog1 = Read(new HttpSettings { Query = "http://www.google.com" });// using get XDocument for url
// pre parsing is for manually parsing non conforming XHTML
var goog2 = ReadXml(new HttpSettings { Query = "http://www.google.com" }, x => x);byte[] a = ReadWebBinaryResource("http://subsonicproject.com/content/images/SubSonicSMall.png");
// Example: Parse site using registed navigation steps
// Get states
states = Navigation.First.Read().ParseXElement
(data =>
{
return from lnk in GetTagElements(data, "").HtmlAnchors()
where lnk.Attribute("HREF") != null
&& lnk.Attribute("HREF").Value.StartsWith("/sv/buscar.html")
select lnk;
}
);
```### Why OsoFramework
Because my dog looks like a bear. And also it rhymes with Mono and Awesome.### What OsoFramework offers
* Easy to parse XHTML pages
* Fast way to save extracted data to databases (SQLite, SQL Server, MySql?)
* LINQ extensions for XDocument allows easy to parse HTML tags
* Text search tag converter when HTML is hard to parse.
* Extensive logging support### Introduction
This tutorial explains how to create a new web robot using OsoFramework.### Adding references to a new project
Create a new solution and project from Visual Studio.NET or Mono Develop. Add the following references:log4net (found in Third Party folder)
System.Configuration
System.Linq
System.Xml
OsoFramework
OsoFramework.Http
SubSonic (found in Third Party folder)
IronPython 2.6 (found in Third Party folder)
Microsoft Unity Block (found in Third Party folder)
Following the OsoExamples project organization, add a DatabaseSchema folder, which stores the database schemas and Robot folder, storing the robots.### Creating the database schema
In the DatabaseSchema folder, create a new class named SearchResult.cs. Inherit from IParseData and add the following properties described below.
```
public class SearchResult: IParseData
{
public SearchResult()
{
Title = string.Empty;
Url = string.Empty;
Description = string.Empty;
LastUpdated = DateTime.Now;
}
///
/// by default we have an autonumber
///
public Int64 ID
{
get;
set;
}///
/// KeyIndex is the column we want to index, useful for querying existing items.
///
public string KeyIndex
{
get
{
return Url;
}
set
{
Url= value;
}
}public string Url { get; set; }
public string Description { get; set; }
public string Title { get; set; }
public DateTime LastUpdated { get; set; }
}
```
Create another class called SearchResultDataRepository, which is the repository class for the new search results schema.
```
public class SearchResultDataRepository: DataRepositoryBase
// Use IDataRepositoryBase if a custom data access API is required
{
public SearchResultDataRepository()
{
}
}
```
### Creating the robot class
Add a new class in Robot folder and inherit from WebRobotBase and IWebRobot. Name the class GoogleSearch.In the Start method, add the following:
```
public void Start()
{
Main();
}private void Main()
{
// using read
string goog1 = Read(new HttpSettings { Query = "http://www.google.com" });
// info prints to log and enterprise manager console
Info(goog1);// using get XDocument for url
// pre parsing is for manually parsing non conforming XHTML, not used here
var goog2 = ReadXml(new HttpSettings { Query = "http://www.google.com" },
response => response);byte[] a = ReadWebBinaryResource("http://subsonicproject.com/content/images/SubSonicSMall.png");
Info("google ended");
}
```
### Configuring Dependency Injection Configuration (Microsoft Unity)
Here you will add the name of the robot, which is required for it to work correctly.Additionally, add a connection string in the connection strings section and set the correct data repository type.
```
```