Author Archives: Ulrich Scheller

Concurrency in java (german)

Category : Android java

The Free Lunch Is Over – that is the famous headline of an article about how the evolution of hardware alone will not solve our performance problems anymore. With the rise of multicore CPUs, software developers have to put effort into their code, to see further performance gains. Even on mobile phones, as the Tegra 2 shows, we will see more and more multicore CPUs. Therefore, there is a good reason for using concurrency in your application. However, it can also be dangerous and lead to unpredictable errors.


The following is a talk I have given to co-workers about concurrency in general and how to do it in java. It focuses less on academic problems like deadlocks, but shows how painful multi-threading is with the standard library. And there are some best-practices for how to reduce the likeliness of errors.


Building a very easy text classifier in python

Category : python

Some of the developers at match2blue are creating a text-interest-matcher. Leaving buzzword bingo aside, that means the software calculates whether a text is interesting based on users’ interests. So basically you, as a user, have to enter some interests and will be presented some pieces of data in order of their relevance. You can also think of it as text classification into either good or bad.

This software has become quite complicated, because it is necessary to have some kind of semantic knowledge about the interests. But there are different methods of text classification. I was curious about how hard it is to write the most simple text classifier that gives you decent results. Well, turns out it is remarkably simple. Creating the classifier took less than two hours. And here is the source code:

import os
j = os.path.join

def train(text):
	Train a dictionary with the given text. Returns a dictionary of dictionaries,
	that describes the probabilities of all word-folling-ocurrences in the text.

	For example, the string "a test" gives this result:
	>>> train("a test")
	{'': {'a': 1}, 'a': {'test': 1}}
	Meaning that the empty string '' has been followed once by 'a' and 'a' has been
	followed by 'test' as well.

	A longer example leads to a more complex dictionary:
	>>> train("this is a test oh what a test")
	{'': {'this': 1}, 'a': {'test': 2}, 'what': {'a': 1}, 'oh': {'what': 1}, 'this': {'is': 1}, 'is': {'a': 1}, 'test': {'oh': 1}}
	c = {}
	lastword = ""
	for word in text.split():
		word = word.lower()
		if c.has_key(lastword):
			inner = c[lastword]
			if inner.has_key(word):
				inner[word] += 1
				inner[word] = 1
			c[lastword] = {word: 1}
		lastword = word
	return c

def probability_of(dict, lastword, word):
	Helper function for calculating the probability of word following lastword
	in the category given by dict.

	>>> category = train("this is a test")
	>>> probability_of(category, "a", "test")

	>>> probability_of(category, "any", "words")
	word = word.lower()
	if dict.has_key(lastword):
		inner = dict[lastword]
		sumvalues = sum([v for v in inner.values()])
		if inner.has_key(word):
			return inner[word] / (sumvalues * 1.0)
	return 0

def classify(text, dict):
	Returns the probability that a text is from the given category. For every pair of
	words the probability_of value is calculated, summarized and divided by the amount
	of words in the text.

	>>> category = train("this is a test")
	>>> classify("a test with some words", category)

	>>> classify("just writing test or a doesn't improve the ranking", category)
	lastword = ""
	probabilities = 0
	for word in text.split():
		probabilities += probability_of(dict, lastword, word)
		lastword = word
	return probabilities / (len(text.split()) * 1.0)

if __name__ == "__main__":
	Calculate the category, that the text in ../test matches best.
	ranking = []
	for file in os.listdir("categories"):
		trained = train(open(j("categories", file)).read())
		value = classify(open("test").read(), trained)
		print "test is", file, "with", value, "% probability"
		ranking.append((value, file))

	print "The test text is probably from", ranking[-1][1]
	print "(second guess is", ranking[-2][1] + ")"

How does it work? There are two very simple steps it does. First, the classifier has to be trained with existing textfiles. The result is a dictionary that consists of many inner dictionaries. Let’s feed it with some text to see what happens.

In [5]: train("a test a good test a good good test")

{'': {'a': 1},
 'a': {'good': 2, 'test': 1},
 'good': {'good': 1, 'test': 2},
 'test': {'a': 2}}

In the result we can see that three words followed ‘a’. Two times it was ‘good’ and one time ‘test’. This is all we need for classifying. Now we can apply the classify function. It goes through the text to classify and looks for known word follow-ups. If there is a known one, the probability of this follow-up is added. So, in our example the probability of ‘a good’ is 2/3 and for ‘a test’ it is 1/3.

In [2]: a = train("a test a good test a good good test")

In [3]: classify("is it a good test", a)

Out[3]: 0.26666666666666666

In [4]: classify("text good but different style", a)

Out[4]: 0.0

The first example has similar words and ordering as the trained text. The second one also has some exact same words, but they are in a different order. Therefore, the probability of this text beeing a is 0.0.

If you want to try it with some longer text, you can download the classifier from Google Code. It is easy to create your own categories by adding a file in the appropriate folder. Just make sure you have a decent amount of data. Three sentences are not enough for a good classification.

In my tests I got quite good results even for classifying authors writing about the same topic.

cd classify/src/

Things I am missing in Android development

Category : Android

This is a list of things I would like to see for Android to improve the development of apps. Don’t get me wrong, Android is one of the best platforms out there, but this doesn’t mean it can not improve anymore.

Hot Code Replacement
It always takes a while to deploy your application on the mobile phone or in the emulator. If you just want to find out what is happening in the code this deployment cycle adds up to quite some time. I know it is hard to replace Dalvik code from a Sun Java machine, but it would help a lot. We even had hot code replacement with our Robocup machines while they were playing soccer. Another possibility would be to use something like hotswap for python. It monitors changes of source files and applies them to the running process as soon as possible.

Flawless USB Connection
There are many cases where you have to unplug and plugin the development phone again. For example when I adb uninstall an app, it can not be installed again until the USB connection has been reset. The same goes for installing an apk from the SD card. You have to connect the phone as an external device, copy the apk and unplug again, before installing the app.
Another thing is that LogCat most of the time does not recognize when the phone is plugged in again. So you always need to switch to the DDMS view in Eclipse, click on the device and go back to the Java view again.

Simple caching of files
Some time ago I had this question on StackOverflow. Disk space is extremely limited on current Android phones. On the iPhone you can easily bundle 20 Megabytes of data with your application, because the app partition is huge. In contrary, most G1 users don’t even have 3 Megabytes of space on their phones. So until every Android device has a huge app partition (or could use the sdcard for it), we need to cache files and delete them on demand. There should be a standard way of caching, like RomainGuy described it with SoftReferences for storing images in memory.

Just in Time Compiler
Now this one is already in the works, but it might still be a while until it sees the light of day. But obviously the overall performance is not even close to the iPhone right now. A JIT compiler should be able to nearly close the gap.

Hardware Graphics Acceleration
The hardware of most Android devices is capable of handling a lot of 3D objects and textures with high framerates. In the standard user interface, this acceleration is not used. Hardware acceleration with OpenGL should speed up the overall performance a lot.

What kind of things can you think of? Put your own wishlist in the comments.

(this entry is cross-posted from my old blogger site)

Android Fundamentals talk

Category : Android

This is the screencast and voice of my Android Fundamentals talk from december 2009. It was given on the 29th floor of our jentower for the towerbyte.

“Read More”

Android: Deploying multiple targets from one project (outdated)

Category : Android java

Update: This way of deploying multiple targets is considered outdated. There is a better way now.

This posting is about how to create multiple versions of your Android application without cloning the whole project. For example if you want to create a full (paid) app, as well as a lite (free) version of it, you might want to automate the task of switching between them. Both versions should be able to use different graphics, different strings and even different featuresets.

First of all, what causes trouble with multiple targets on Android is the auto-generated source code and the strict checking of Java. Strings and graphics are all kept in one place, namely the res folder. Simply creating one res folder for each target and switch between those folders solves the problem with all resource files. I will give an example Ant-script for this later on.

So, having different resource files seems easy. But there is one more problem. We want to have two different applications, so both targets don’t replace each other on our phone. Meaning, the targets need a unique package name in the AndroidManifest.xml. And it’s getting worse. When changing your applications package name, you also change the package of the automatically generated R file. This R file usually is referenced in a lot of source files – basically everywhere you need graphics, strings or other resources. So you end up editing a great amount of your sourcefiles when changing the application package name.

What is the trick over here? Well, I don’t have any. My approach goes through every Java file and changes the import statement for the R file:

There are several targets in this Ant-script. The default target is myproject, the others depend on the default target (for example myproject is always executed before the otherResources target). myproject does the following:

  • deletes the current res folder
  • copies its own customized resources into res
  • replaces all ocurrences of “import*).R;” with “import;”. This might be necessary if the iamdifferent target changed it to “import;” for example.
  • the package name in the AndroidManifest is changed to the target name

Just execute a target to switch to this version of your app. You need to refresh the project in order to reflect the changes. In Eclipse, this can automatically be activated by setting the following option:

One more thing that does not work out of the box is the android:name parameter in activity declarations of the AndroidManifest. When auto-generated, they are declared relative to the project package. This doesn’t work anymore when the package name is changed. Therefore you have to set the activity name with absolute values. Instead of

<activity name=”.Test”>

You have to write something like

<activity name=””>

For all of you who like free lunch, here is an example project with two targets ready for download.

This is how three targets of the same app look like:

Thats it. With this approach you can deploy as many customized versions of the same project as you want. If you are missing a step or know an optimization of this, please leave me a comment.

(this entry is cross-posted from my old blogger site)

Recent Posts