============================================================================= Typing in Java and Python - A Comparative Analysis ============================================================================= :Author: Mikael Jansson :Course: Topics in Programming Languages (TDA260), Chalmers University of Technology :Date: January 17th, 2006 Typing in General ================= There are two main classes of type systems in programming languages, namely monomorphic and polymorphic. The former means that a function can only be applied to values of specific types, whereas the latter means that a function can in some way be applied to values of more than one type. Polymorphism can be divided into two main classes: universal and ad-hoc, where universal can be either parametric or inclusion-based, and ad-hoc can be implemented through overloading or coercion ([Cardelli 85], pp. 4). Parametric polymorphism means that a function parametrisizes its type, such that it could be used for any value if instantiated for that particular type first. Inclusion polymorphism is what object-oriented languages utilize through inheritance: a function that can be applied to a more general type can also be applied to the more specific type (i.e., lower in the inheritance chain). Ad-hoc polymorphism is implemented either by means of overloading function names, or by coercing values. The name itself means that a function may seem to have universal polymorphism, while in fact it might not hold for all types in a specific type class. In the case of overloading, a function can have more than one different type signature, but is in fact, only syntactic sugar for having the same name for different functions -- an implicit dispatch table, if you may. There is nothing that guarantees the same behaviour for overloaded functions, and they may in fact do the opposite of any other function with the same name. Static typing enforces that a function cannot be applied to a value if there is no function with the correct type signature. Coersion is when there are rules for transforming a type into another type for which there might be a function suitable, e.g. the ``+`` operator which might only be defined for real numbers, and in order to make an integer addition a coersion to real has to be made. .. [Cardelli 85] "On Understanding Types, Data Abstraction, and Polymorphism" Luca Cardelli 1985 Java ==== Versions previous to 1.5 have monomorphic functions, but achieves universal polymorphism through inclusion. In the latest iteration, however, types can be parameterized through the concept of generics. Type variables were also introduced. Furthermore, there are two kinds of types: primitives (such as integers and booleans) and Objects. The Java class library -- especially Collections -- operate on the primitives' object counterpart, e.g. Integer. In 1.5, there's a new feature, *autoboxing*, to let the compiler deal with the problem. Generics -------- Quantification on the type variable:: int sum(Collection xs) { int s = 0; for (Iterator i = xs.iterator(); i.hasNext();) { s += i.next();} return s;} Eliminates unsafe type casting; in previous versions there was no way of being certain a collection would only contain elements of a specific type, short of explicitly iterate over all items and verify their types. This makes code easier to use, as it imposes less pre-conditions on the user of the functions -- types are checked at compile time instead of run time, i.e. by manual type casting. The type information in generics is only available at compile time, after which the object only has the instantiated type. This for backward compability with code not utilizing or knowing about generics; as such, generalized types can be treated (in fact, at run time, are) "raw" Java types. Legacy code can, however, in the case with the above ``sum`` function, insert non-integers into the collection, and only when executing ``sum`` will there be a failure with a ``ClassCastException``. There exists special Java classes for attaching run-time checks that no invalid values can even be inserted to the collection, which can be used where needed. Generics in Java can also contain wildcards for universal quantification:: int length(Collection xs) { int l = 0; for (Iterator i = xs.iterator(); i.hasNext(); i.next()) { l += 1;} return l;} The quantification (both wildcard and regular types) can have a lower or upper bound, through the use of the ``super`` and ``extends`` keywords, where the former imposes a lower bound and the latter an upper bound. We modify the sum() function to give an example of ``extends``:: int sum(Collection xs) { int s = 0; for (Iterator i = xs.iterator(); i.hasNext();) { s += i.next();} return s;} Lower bounds are useful for e.g. operating on generic data structures. The following function declaration would let the user add an object x of type T into any queue structure that supports adding a super-type of T:: int addToQ(T x, Q q) { ... } Classes and Interfaces ---------------------- Classes in Java utilize existential quantification in that the public methods of the class represent the general interface, immediately instantiated by the code in the methods themselves. This achieves information hiding, by not exposing all of the actual implementation of the code, allowing for private methods and state to be used internally as helpers. Interfaces allows for i) inclusion polymorphism and ii) existential quantification as it enables functions to parameterize on the any sub-type of the interface while hiding the the actual type/implementation of the interface. Any given class can extend zero or more classes, but only implement the behaviour of zero or one interface. .. Autoboxing .. ---------- .. Java has primitive types which aren't objects, and e.g. the Collection .. interface in Java operates on Objects. As such, primitives need to be .. converted into its corresponding Object (Integer for int, etc.), before it can .. be used. Autoboxing aims to remedy the hassle of explicitly converting by .. providing with coercion rules for primitives to the Object-based types. .. However, this causes problems in situations such as equality checking, where .. == is used to check for *object identity* rather than unboxing the value and .. comparing the primite value. In this case, .equals() should have been used .. instead. There is also a performance penalty involved: for integers, automatic .. unboxing is not performed for byte-sized values, otherwise the penalty is of .. factor ~10. Python ====== Unlike Java, there are no primitives -- everything is an object. Python has classes, but no interfaces. Instead of a static type system, Python utilizes *late binding* for attribute access, causing a run-time exception should the object not support the given operation. In fact, that's a common idiom for checking for the existance of variable names:: try: foo except NameError, x: # name 'foo' not in use pass In Python (like Java), variables are reference, but the name itself does not have a type attached to it -- all it does is point to an object. Links has to be followed to give the return value. Because of the late binding, Python does not use interfaces, but instead relies on naming conventions, even if there's some work (Zope's zope.interface, most notably) to get a more formalized interface. For example, given the following function which counts the occurance of the literal '1' in a file:: def countOnes(f): return len([i for i in f.read() if i == '1']) ``lineCount`` can be used with file-*like* objects supporting the operation .read(). We could therefore define a class that's not a file at all, but still supports the read operation:: class RandomSequence: def __init__(self, count): self.count = count def read(self): return [str(random.randint(0, 9)) for i in range(self.count)] rs = RandomSequence(100) print "Out of 100 numbers,", countOnes(rs), "were ones." Objects can be dynamically altered at run time; adding and removing attributes to objects is a common task in Python. This is form of universal polymorphism, achieved through the *overloading* ad-hoc polymorphism. Magic names are used for supporting operators in the language. For example, the operator ``+`` is called ``__add__`` and can be added to any class definition. Python also supports coercion for certain types, such as addition with integer and real values producing a real value.