StupIdiocy
Innovation through Idiotic Ideas
Innovation through Idiotic Ideas
Aug 8th
Now and then I come across a set of data on the Internet that I wish I could toss into an Excel spreadsheet for sorting but more than a few pages and copy/paste is out of the question. For times like these I generally wrote a web scraper using C# and some crazy regular expressions. Today I needed to grab a much larger dataset that is generated by a very old application we have on hand. The interface for the app generates very clean HTML but unfortunately its data is stored in an old proprietary format and we needed to move it into one of our SQL Servers to be able to report against it.
Since the HTML is much more complex this time around, I knew that regular expressions were not going to cut it and so I started looking around for an HTML parsing engine. I came across the HTML Agility Pack on CodePlex and was excited by the feature list so I decided to give it a try. I imported the library and instantly noticed the primary object I would be using, HtmlAgilityPack.HtmlDocument, conflicts with the System.Windows.Forms.HtmlDocument object. If this was a larger project I would build out a class to do the parsing so I didn’t have to fully qualify around the conflicts, but I had to remind myself that I will only be running this import once so there’s no need to over-engineer things.
To load your HTML into the HtmlDocument class you can either load directly from a stream object or download the data yourself and pass it in via a string. I opted to use the quick and dirty method with WebClient.DownloadData since I didn’t want to deal with asynchronous calls and having to maintain any sort of state in the application. I also added a failsafe try/catch combo in case the server failed to load a page, but it really should have a proper error handler here.
private HtmlDocument ParseHtml(string URL)
{
HtmlDocument hDoc = new HtmlDocument();
try
{
WebClient wClient = new WebClient();
byte[] bData = wClient.DownloadData(URL);
hDoc.LoadHtml(ASCIIEncoding.ASCII.GetString(bData));
}
catch
{
hDoc.LoadHtml("");
}
return hDoc;
}
Now that we have an HtmlDocument we can use the standard SelectNodes and SelectSingleNode methods with some Xpath to grab the proper nodes. For instance here I will loop through all div’s on the page that have a class value of “result”.
foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//div[@class='result']"))
{
Log(hNode.InnerText);
}
By building more and more complex Xpath statements you can drill right down to the value you need to store. In this program I built a List
> object that is a collection of columns and rows of data which can then be saved into a CSV file with a simple function.
// Setup the data storage
List
> lData = new List
>();
// Add a few rows of data
List lRow;
lRow = new List();
lRow.Add("column1");
lRow.Add("column2");
lRow.Add("column3");
lData.Add(lRow);
lRow = new List();
lRow.Add("column1");
lRow.Add("column2");
lRow.Add("column3");
lData.Add(lRow);
// Write the rows of data
TextWriter tWriter = new StreamWriter("data.csv");
foreach (List lItem in lData)
{
string sLine = "";
// Build the columns
foreach (string sData in lItem)
{
sLine += "\"" + sData + "\",";
}
tWriter.WriteLine(sLine);
}
tWriter.Close();
With the HTML Agility Pack I easily saved two hours on this code and I will be sure to tuck it away in my toolbox for the next time I need to deal with remote HTML.
Aug 3rd
Today at the office we were forced to once again deal with a bit of Wikipedia vandalism, something that really shouldn’t happen as frequently as it actually does. Not only were two of our pages modified, but one of actually deleted by a Wikipedia moderator for “blatant advertising”. (To be honest it was a bit spammy, but I don’t think it crossed the line when compared to other high profile brands like ours.) This is something we’ve had to deal with on Wikipedia before but I guess that is the nature of high profile pages. Since we hadn’t been keeping a close eye on these pages the changes were not noticed for quite some time.
We’ve now been tasked to start monitoring certain pages for any future vandalism, but what is the best method to handle this? My first thought was to use the built-in Wikipedia watchlist but those have proven to be problematic in the past so I wanted to avoid that from the start. I also looked at using the Wikipedia RSS Feeds to monitor for changes like they discussed on digital inspiration but that requires setting everyone up to use an RSS News Reader or building out email alerts that would be vague as best. (These notifications need to go to various different PR Teams not Technology folks, so they need to be easy to understand.) It was at that point that I decided to write a tool that specifically met the requirements of the task at hand.
The functionality I wanted for my first version was pretty simple, I wanted to monitor pages on Wikipedia for changes and then email an alert to the responsible team when something is detected. Since these changes seem to happen randomly this means no-one is forced to review the page daily for vandalism and only have to react when an alert comes in. Since I have full control over the email I can ensure it is Blackberry friendly, making it even more useful. (Vandals tend to strike outside of normal business hours, go figure…)
Now I needed to determine how I would determine if changes have occurred and how to react to the varied levels of vandalism. I decided to start with a basic system that will do an MD5 hash on the webpage text and compare it to the previous known good value. If the hashes are different the page text is compared to determine the level of difference using a slightly modified O(ND) Difference Algorithm. The text is also scanned at this time using a list of known trigger words (swears). The level of difference in the text and the weight of the trigger words that were found determines if the alert email is sent as high priority or not. This ensures that an alert is not generated for small updates, but only when someone replaces a large block of text or fills the page with profanity.
I setup a local wiki that I can use as a testbed and so far things are looking promising. No false positives yet, and minor updates of a few words have went without a single alert. Adding one strong swear word, however, generates an instant high priority email. Perfect.
If there is any interest I can package and release the C# source code, just leave a comment below.
Aug 2nd
The experts on the ArchosFans Forum are at it again, this time giving us a nice little tutorial on building your own custom Android firmware for the Archos 7. Forum users pmarin and knightdominion teamed up to write the tutorial which shows you step by step how to access and modify the filesystem image, allowing you to create a totally custom load for your needs.
In order to create your own firmware you will need a few things, a base firmware, MyDroidDevTool, and a Linux environment.
Base Firmware
You need to start somewhere and that is exactly what a base load is for. Either grab the Latest Archos 7 Firmware or one of the two custom firmware versions that knightdominion released. (Rooted Archos 7 Firmware or Operation Unbrickable Firmware)
MyDroidDevTool
Download MyDroidDevTool and extract AFPTool.exe to the same location that you saved your base firmware. AFPTool is used to unpack and repack the update.img files and is the only file you need from the SDK.
Linux Environment
The tutorial calls for a Linux VM or some other way to mount an ext2 filesystem (such as Cygwin or a machine running Ubuntu). I will be using my Ubuntu server but those wishing to do it all on their Windows desktops should look at one of the alternatives, any one of them will get the job done.
Copy both AFPTool.exe and your firmware image to a new directory on your system and run the following from the command line in order to extract the update image.
AFPTool.exe -unpack update.img update
If there are any errors you should redownload your update.img file and try again, otherwise you should have a new directory named update that contains the extracted files.
Now that you have extracted the update image, copy the system.img file from the Image directory to your Linux machine. Once there you need to run the following commands to mount the filesystem.
mkdir stock-system
sudo mount -o loop system.img stock-system
genext2fs -d stock-system -b 300000 system.ext2
mkdir temp
sudo mount -o loop system.ext2 temp/
You will now have a directory named temp that contains the base system image that you can modify.
I will be revisiting this section in the future as I start getting into deeper hacking projects. Right now I am only starting to hack the base OS, so we’ll have to see where that leads.
Until then check around the web for some good tutorials on hacking Android for more information on what is possible.
When you have completed your changes to the files run the following commands to rebuild a new system image.
sudo mkfs.cramfs temp/ system.1.5.img
sudo umount stock-system
sudo umount temp
Take the system.1.5.img file from the previous step and copy it back to your Windows machine, replacing the original system.img you extracted. You can then execute the following command to pack up a new update.img file.
AFPTool.exe -pack update update.img
Testing your firmware will be just like any other installstaion on the Archos 7. Connect the device to your computer via USB and drag the update.img file to the root of the device. When you disconnect the tablet it should prompt you to update and after a few reboots you will be running your modified firmware.
What plans do you have for the Archos 7 tablet? Let me know in the comments!
Jul 27th
The Android world is buzzing tonight with news that the Augen 7″ tablet will be available this week at K-Mart stores nationwide for $149.99. At first I blew off the news being K-Mart and the price point, but after reading the story I must say I’m interested. Android 2.1, a faster processor, double the RAM — if the touchscreen works as well (or better) than my Archos we have a winner here. Since it comes with the 2.1 OS on it doing an upgrade to 2.2 (Froyo) should happen in no time, if it hasn’t already been pulled off by the Android community. I am still a bit worried about the price since there has to be a line you cross at some point between low cost and cheap. The Archos is a solid unit and feels like quality in your hands, even if it does have a bit of weight. Hopefully the Augen tablet feels this way as well, and doesn’t feel like cheap plastic.
Augen GENTOUCH78 Specs:
7″ 800×480 display 800 Mhz CPU 256 MB RAM 2 GB internal memory Android 2.1 OS On board WiFi, and a SD/MMC card slot Leather Case Included $150 from K-Mart
Archos 7 Specs:
7″ 800×480 display 600MHz ARM9 Rockchip 2808 CPU 128MB RAM 8GB internal memory Android 1.5 OS On board WiFi, and a SD/MMC card slot Leather Case Purchased Separately $200 from Amazon.com
I will be heading over to K-Mart this week to pick one of these up and give it a spin and I will be posting first impressions in the coming days. My addiction to Android is growing.
Jul 22nd
While browsing the Android Marketplace for new ways to use my Archos I came across Wyse Technology’s beta release of PocketCloud for Android. The description on the marketplace certainly piqued my interest.
Wyse PocketCloud™ enables complete access to your Windows PC, virtual machine or Remote Desktop Services from your mobile device.
Features:
- Intuitive User Interface
- High-accuracy Touch Pointer
- Remote Desktop Protocol (RDP 7)
- VMware View Support (Pending Certification)
- Enterprise Security
- VNC Protocol (Tech Preview)
I have quite a crazy little network setup in my house that consists of four Windows 7 machines (including the Media Center) and two Dell Servers running VMware ESX each with two virtual machines running Ubuntu Server Edition. To manage the machines I use all three protocols that PocketCloud is targetting so it was the perfect fit for me. I did not expect to get VMware View support on the tablet but it works like a champ and means I no longer have to search for an SSH client. I also went ahead and installed the PocketCloud Windows Companion on my primary machine which gives some extra features when the tablet is connected.
Included with Wyse PocketCloud is a Windows client-side companion application that once installed on your remote machine, enables more advanced capabilities such as Thin-Browser™ (an enhanced, server-side browser capability), enabling full access to Web sites with Flash content.
There isn’t much info about the desktop companion that I could find but the major benefit I see is having it automatically open the keyboard when a textbox requests input. Without the desktop software you have to open the menu and click the keyboard icon any time you need to type something. Obviously I won’t be doing any major work on my computers from the Archos but if I need to start a download, restart a service, or reboot a machine it works great. Any more than a few minutes though and I might as well walk to my computer.
In order to install PocketCloud on your Archos you will need access to the Android Marketplace. This means you will need to install one of the hacked Archos firmware versions that I’ve discussed in the past. (If you have the technical knowledge I highly recommend the rooted Firmware.) If you have already hacked your Archos or if you are on a normal Android device then click the marketplace link below to be taken directly to the PocketCloud download.
market://search?q=pname:com.wyse.pocketcloud
The application is very user friendly and while it might not be something I use every single day it is a nice application to add to my Archos 7 toolbelt.
Jul 15th
Well, he’s at it again. Dom has released his newest Archos 7 Firmware version over at the ArchosFans forums. This time he’s managed to pull off root access on the device and give us extra storage space for applications. I’ve hit the application storage space limit multiple times while using the tablet, so this is a very welcome update for me. I plan on starting fresh and testing all new applications after this update and the extra storage space will help. Be sure to post in the thread to let Dom know you appreciate his work and if possible donate a few bucks to the man. He deserves it.
Following the typical process I copied the update.apk to my Archos then disconnected the USB and started the update. The update went as expected and after another boot sequence I was presented with the new custom startup screen and I must say I like it a lot better than the stock image. I went through the startup screens then reconnected the tablet and was asked to format the drive as expected. The instructions tell you to wait 45 minutes or more after formatting so you don’t stress the ROM, but I couldn’t wait and flashed right after it was done. After the final boot sequence was completed I went and checked the free space on the device which is showing 290MB of application space. That’s over 5x more than the stock firmware and should be plenty of space for now.
One word of caution if you plan on installing this firmware. Since this will root the device you can give applications total control over the filesystem, so be aware when installing applications. Only install applications from trusted sources and pay attention to the security dialog that pops up. Dom was kind enough to include the Superuser Permissions application so you receive an additional prompt so make sure you know what you’re doing if you grant those permissions.
Jul 11th
With the number of connections we have to people and information sources on the Internet I am finding it difficult to maintain control over the information firehose. While having the Archos has helped me stay up to date I still find myself inundated and unable to manage everything as much as I would like. I was already looking for a development project for the Archos and this seemed like a good place to start. Since the processing power on the Archos is limited and I do not want to drain the battery by doing hundreds of queries every few minutes I am going to propose a client/server solution to solve the problem. My primary desktop will act as the server and will handle all of the acquisition, storage, and sorting of the incoming data. I would also like to do some more research into how the semantic web actually works so that I can properly group the data into usable chunks and metadata types. The client will run on the Archos unit itself and can then simply query the desktop server to fetch the latest information without wasting extra resources. This setup works perfectly for me as I tend to use my tablet when on the couch in front of the TV, so I have access to my local network and all of my servers.
I already have the client/server communications working between the C# server framework on the desktop and Java home screen widget on the Archos. I was very happy to see that when developing on Android you are not forced to use the old Java layout managers, the XML layout files are a godsend for those of us who worked in early versions of Swing. My next tasks involve the database design and data retrieval subsystems which I expect to take around a week. Then I will start passing live data to the handheld and begin work on the grouping logic. More updates soon!
Jul 10th
Dom has recently updated the Operation Unbrickable firmware for the Archos 7 to version 3. This is the same firmware I used in my first Hacking the Archos 7 Home Tablet post except he has now corrected the issues with wifi setup by holding off on the Google setup wizard until you first launch one of the applications. The Google Contacts application provider has been hacked to work now, as well, but I have yet to fully test that personally. (At some point I will migrate my contacts, until then they are stored in my local Microsoft Exchange server.) There have been reports from users that it works flawlessly though.
In the coming days he has also promised us a rooted version of the firmware that will allow for more storage space and will come preinstalled with Apps2SD support. This will open up a lot of room for applications on our devices and will hopefully prevent us from losing everything between firmware updates. If you own an Archos 7 HT you need this firmware.
Jul 10th
In today’s Social Web users have come to expect a certain level integration from the websites they visit and has become the key to success in helping your users spread the word. Bringing a totally integrated social experience to WordPress is not as hard as it was a few years ago, but wading through the garbage and building a total strategy can still be daunting. The best way I can think of to handle this is to build a comprehensive list of all possible integration points and then attack them one at a time via plugins and theme modifications. One of the primary things I will be keeping in mind is that I do not want to overwhelm the user with share buttons that totally destroy the usability or visual appeal of the website.
You can do too much social media integration and you need to pick a few of the top sites to focus on. For StupIdiocy I will be setting my social media focus on Facebook and Twitter integration, other sites might find it more relevant to use LinkedIn, FourSquare, or others depending on the primary use of the site. Our primary goals in the social media integration will be:
Jul 8th
The built in Northstar system for the Rovio allows us to track the current position of the robot in real time using IR room beacons. When I started the Automated Mapping System for my Home AI I ran into a few issues, mainly that as the Rovio gets further from a beacon the positional data became less reliable. I realize this is to be expected but it makes mapping the robot dataset to a real-world view of the home a bit difficult. That is when I realized I don’t need the robot to see the world the way I see it, I need it do see the world in a consistent manner. Even if I perceive the dataset to be flawed, the robot sees it as perfectly valid data time and time again.
With this in mind I redesigned my internal use of the data and cleared the position database. In my prior version I was only storing clear and blocked paths in front of the robot but I have now expanded the database to store Known Clear, Assumed Clear, and Blocked locations which are displayed in Green, Yellow, and Red on the control panel. I am assuming that the calculated x, y position of the Rovio is free of obstacles (since we’re sitting there), and that the sensor data is coming from 6″ in front of the Rovio. While the Rovio will report an obstacle at a much greater distance I can handle that when the Rovio revisits that location at a different angle, so even though my data is not perfect at the start it only takes a few passes before things start filling in.
A quick scan of the Rovio API Documentation tells me that the x, y positions are between -32767 and 32768 so I assume that 0, 0 is directly centered under the beacon on the ceiling. Browsing an older thread on RoboCommunity my assumptions were proven to be correct as you can see in this graphic created by milw.
Now that I knew exactly how the positional data worked calculating a point in front of the Rovio was as easy as some simple math.
double offsetx, offsety;
offsetx = x + (distance * Math.Sin(theta));
offsety = y + (distance * Math.Cos(theta));
After running some quick tests I realized that the base y coordinate was backwards for some reason. I am not sure if this is due to something I did, but a quick fix was implemented.
y *= -1;
I increased the size of my brushes and ran another few laps around the computer room to verify the changes and everything looks perfect. Well, almost. It seems I was a bit too generous with my distance factor for the obstacle sensor. You can see the problem in the following image, represented by the large patches of “assumed clear” sections between the “known clear” and “known blocked” areas.
Now that I know the mapping is working properly, I can unleash both of the robots tonight in the basement. Hopefully my “disable mapping and go home” logic works when the battery runs low, guess we’ll find out.