Sunday, November 28, 2010

Gnumeric isprime() trial division

Gnumeric is one of several spreadsheet choices available to a Debian and / or GNU / Linux user.

How useful is Gnumeric to a mathematician? Quite.

It does include several functions that a more general package LibreOffice / OpenOffice Calc / Excel might not include.

Statistics and Number Theory are areas where Gnumeric currently outpaces the competition, in terms of available functions.

Here is one example, the isprime() function:
/*
 * Returns -1 (out of bounds), 0 (non-prime), or 1 (prime).
 */
static int
isprime (guint64 n)
{
      int i = 1;
      guint64 p = 2;

      if (n <= 1)
            return 0;

      for (i = 1; p * p <= n; i++) {
            if (ithprime (i, &p))
                  return -1;
            if (n % p == 0)
                  return 0;
      }

      return 1;
}

Do not be alarmed if you are not comfortable looking through the above code, it is simply there so that I may elaborate next.

The method here is trial division, which is a reasonable way of going about things for integers say of a billion or so. It is implemented in C and so is likely pretty fast at getting it's answer.

In fact a little bit of trial and error can show (very roughly) where the threshold for this isprime() function lies:

Firstly a candidate number (fairly large by spreadsheet standards):

factorial(18)+1=6402373705728001

which isprime() answers with #LIMIT!

Now those handy zeros at the end of the number will serve to reduce the number in size a little. Removing a zero from ...001 give ...01 and a new number:  640237370572801

...which Gnumeric isprime() is happy to process and return FALSE.

If you are wondering, then the new number ending ...2801 has a factor of 2801, how interesting, and completely accidental.

Anyway, I have provided a 15 digit number that isprime() will process, and demonstrated that it does have a limit to what it will attempt.

So no, you are not going to find huge primes by using entries in a Gnumeric spreadsheet, however it does provide very convenient environment, for exploring integer construction.

Should you use Gnumeric, to test out your hypothesis for prime number patterns?
Well why not, there is value in knowing that constructive forms hold true up to 15 digits certainly.

Later you will maybe explore a dedicated number theory program, there are several great open source offerings. However Gnumeric seems like a great place to just dive in.

Links and further reading: 

Tuesday, October 26, 2010

Problem solving - when really stuck try...

  • A cup of tea/coffee
  • Social chat with a colleague
  • Walk in the *fresh air
  • Visit a huge building with great architecture/decor
Most of those points are really obvious, but all too easy to forget, when engrossed in a tough Mathematical problem.

*Without realising it you might be really need the fresh air!
( If you work in a Science building and have your own small office, then sitting in there all day with the door closed, will reduce the air quality. )

Visiting the Science Museum in London, a local historic building, or a local Cathedral usually works for me also.
( The Cathedral is not my way of seeking 'something divine', but they are usually architecturally interesting and generally not full of hustle and bustle )

For some, a Distraction can also work, and there is a Psychological Science Article summary here giving some details.

The full 2008 research paper by Chen-Bo Zhong, Dijksterhuis, and Galinsky can be found at utoronto.ca here.

Another option, which sometimes works, is to pick a marginally related problem, and plan to come back to the original in a few days time.

Sunday, October 10, 2010

gmp library - including methods in your c code

.
The gnu Gmp library has many methods for dealing with huge integers.
( It does a lot more than that but I keep a narrow focus for this article )


Here are some links to the documentation for versions 5 and older version 4

When writing your own programs in C, that make use of gmp, you will use an include such as:

#include <gmp.h>

...and then perhaps if working with mpz Integers you might use some of the Integer functions of gmp.

Diving in and compiling with gcc, you might see 'undefined reference to ...' messages, until you have the gmp headers installed, and tell gcc about that.


Compiling against the gmp library is different than simply using the gmp library, and you may need an additional package installed.

On Debian and Ubuntu the source files and/or headers are kept in separate packages from the executables, so as not to bloat your system.

Install the source for gmp using the following command:

apt-get install libgmp-dev

(*Older Debian 6 Squeeze package is named libgmp3-dev rather than libgmp-dev)

...which brings down the necessary files so that gcc can use them in it's compile.


I used aptitude rather than apt-get in the above but the result is the same.

Recap:
  • Use in your C program #include <gmp.h>
  • Have the dev package installed on your system
  • Use the -lgmp flag when you invoke gcc
  • gcc arguments -lm and -lgmp should be the LAST argument
      [ as shown in the first screenshot ]

Note: I use GNU/Linux and give my instructions for Debian and similar systems. If you use a different platform, then I am unable to help with that and you should look elsewhere for instructions tailored to your chosen platform.


(optional) Integer functions of gmp:

What follows are some hints/personal notes, which serve as a useful reminder to me personally.

( Please consult the official manual, using the links in the first paragraph, as a definitive source of explanation. )


/* mpz_sub_ui versus mpz_sub where ui indicates unsigned integer rather than mpz processing */

/* mpz_cdiv_q versus mpz_tdiv_q will ceiling rather than truncate when calculating quotient */

/* mpz_cmp_ui Compare op1 and op2. Return a positive value if op1 > op2, zero if op1 = op2, or a negative value if op1 < op2. */
.

Thursday, September 30, 2010

5 seconds to get million digits of pi - easy benchmark

Here is a great way to benchmark a new system, or pick the leader from a pool of desktop systems.

Okay so you are planning to do some mathematical exploration and you have at your disposal a laptop, and a couple of desktops.

Which one to use?

You can be very scientific about this (more later), or you can just press a few keys on each and pick the winner.

  • apt-get install python-gmpy
  • cd /usr/share/doc/python-gmpy/examples/test
  • python test_large.py
5 seconds for the desktop Amd 7850 which is in a year old Asus P2-M3A3200
25 seconds for the laptop, an Intel T5550 based Dell and is 2 years old
My other desktop came inbetween those two, just slightly beating the laptop.



The winning system is running Debian Squeeze 64 bit and Python 2.6.6:



I mentioned earlier about this being a quick but unscientific way of choosing a suitable machine for you mathematical task.

You will perhaps have a particular task in mind. If it is linear algebra based then you might want to run some sample tests using octave, or scilab, or sage.

Here is a useful link to benchmarks that include some linear algebra/matrix operations:

Whilst looking around for benchmark results, I came across this 2007 post, that includes a 300x300 matrix timing test, and here are my results:



(optional) more technical specs for the Asus:
  • Chipset is AMD 780G
  • Socket is AM2+
  • Onboard Graphics are Radeon HD 3200
  • Memory installed is PC2-6400 DDR2-800 

( The above is by no means cutting edge, but for a £200 desktop system, as the benchmark shows, results are not too shabby )

(optional) Matrix commands in text format if you would like to repeat that 300x300 test in your own program:

M = rand(300,300);
t=0.25; T=2;
tic; inv(M) * (expm(-t*M) - expm(-T*M)); toc;

( I hope I made it clear from the link earlier that those commands appeared on a forum at wilmott.com
In particular blondie and spacemonkey posters used those commands for timing tests )

( The install commands such as apt-get which I have used to install gmpy are specific to my platform GNU/Linux. I am unable to provide help with other platforms, and suggest you might want to find a more suitable howto, that gives useful benchmarks including an install guide for your platform. )

Wednesday, September 15, 2010

sage 'open source mathematics' - some compilation experience

sage (or sagemath) is a unifying front-end to many computerised algebra and mathematical programs.
Previously I have used sage on Ubuntu Karmic and loved it.

Just now there is no official Debian or Ubuntu package for sage, and so installing it on my new system (Debian Squeeze) gave me a choice:
  1. Install the prebuilt binary package that sage themselves provide
  2. Download the source and issue 'make' on my own machine.
There is little advantage to building from source (option 2), unless you will be running mathematical procedures that run over several nights.

In particular if you are new to sage then the prebuilt binary packages are the 'no hassle' way of getting up and running.

If you just want to see what sage is about, then the live cd, is an excellent way to try things out (no change to your system at all)


Compiling and having sage 'self check':

Summary: make test

Note: I work on GNU/Linux and so my instructions are particular to that platform. Seek out the install guides at sagemath.org if you want help with another platform.

Download the latest .tar file from a mirror listed here.
( My .tar was around 300MB and I checked its md5sum matched against the site )

Untar into /usr/local/ or /opt/local/ or your home directory, as you prefer
( I will use /opt/local for this article )

Have a quick look at the readme file at /opt/local/sage-4.5.3/README.txt

Follow the install guide at sagemath.org in conjunction with this article.

I mentioned about getting sage to 'self check', and what I meant by that, was running the inbuilt mathematical tests that come with sage.

Glancing at README.txt you might be thinking of setting SAGE_CHECK="yes", however this is a different 'self check'*

( *Sage is made up of a number of .spkg files and SAGE_CHECK="yes" will initiate integrity checks on those .spkg files and checks for compatibility with your system )

Running make test initiates a two stage process:
  1. The build
  2. mathematical test suite
The test suite (step 2) took a couple of hours on a 2.8 ghz machine running Debian Squeeze 64 bit.


When you are all done then you might want to issue the following command:
chown -R someuser:root /opt/local/sage-4.5.3/

( replace someuser with the the username you normally log in with )

*** End of main article. ***
( The next section is for those interested in lower level details of sagemath source and .spkg testing in particular )

Bundling python within an application and compatibility:

Sage does not use Python that already existed on your system, and instead, bundles Python 2.6.4 to support all sage applications.

This is a perfectly valid approach* but does freeze in time, that bundled Python, against a moving target (your system and upgrades)

( *as a Debian user I would really like a version of sage that integrates more fully with optimised libraries debian itself took the trouble to install. This is not currently the case and I understand some of the why and why not involved. )

In particular Python 2.6.4 bundled in sage 4.5.3 is probably missing a patch to the file test_zlib.py

Unpatched (as supplied by sagemath.org) my output using SAGE_CHECK="yes" was:
test test_zlib failed -- Traceback (most recent call last):
  File "/opt/local/sage-4.5.3/spkg/build/python-2.6.4.p9/src/Lib/test/test_zlib.py", line 84, in test_baddecompressobj
    self.assertRaises(ValueError, zlib.decompressobj, 0)
AssertionError: ValueError not raised

322 tests OK.
3 tests failed:
    test_distutils test_httpservers test_zlib

...but then applying the patch with a manual edit to the file test_zlib.py


...sees test_zlib outcome being different as shown:

test_zlib
323 tests OK.
2 tests failed:
    test_distutils test_httpservers

When the next version of sage (4.6) is officially released, hopefully it will include Python 2.6.5 or newer, so as to have that patch already applied.

Links and reference:

Wednesday, March 24, 2010

Real World Probability - failure rates and social networking

People sometimes have difficulty understanding the relevance of Mathematics in today's world.

One way of overcoming this is to take the time to appreciate (and talk about) real world examples.

This is only a short post just to give you a pointer to something interesting related to Facebook.com - something lots of people just leaving the education system will have had exposure to.

Social networking runs atop of massive datastores, these datastores are at the leading edge of current computing theory, with regard to latency and failure rates. Mathematics is at the heart of understanding the probability of failure.

Facebook is making use of a datastore named Cassandra and the Gossip protocol - what these are is not really so important (from a Mathematics perspective). What is interesting Mathematically is the probability theory involved in the operation of the Failure Detector.

There is a great presentation by Avinash Lakshman and Prashant Malik, which you should be able to locate by websearching using this link.

The mathematics begins around page 17 of 24 titled "Properties of the Failure Detector"

( You don't have to really understand the content - just skipread the slides and ignore much of the jargon. )

The key point of this article, is hopefully, that Mathematics is being put to use today so that you can get your Facebook quicker :)

(Note: Twitter and Buzz are my social networking choices, but that does not stop me lauding Facebook and their use of Mathematics to achieve their business goals)