Sense of Wonder | 02-2004 archives

David Cerezo's Weblog
↑ 03-2004

27-02-2004

Computer tools shaping architectural creations

  In the 20th century, architecture has had a big departure from its previous history: on one side, material science has permitted the use of former impossible forms and, on the other side, CAD/CAM tools (Computer Aided Design and Manufacturing) has released the creativity of architects.
  A close examination of the different underlying mathematics on CAD/CAM tools will show that they liberate and constrain architects at the same time. Roughly, three internal mathematical representations are implemented by CAD/CAM tools:

  Ancient, classical and traditional architecture (Greek and Roman Temples, …) and some instances of modern architecture (Frank Lloyd Wright, Richard Meier, …) describe forms using simple addition and subtraction of spheres, polyhedra and simple solids (cube, cone, cylinders, …)

The Gordon Strong Autmobile Objective, Sugarlof Mountain, Maryland, 1924-1925 Taliesin, Spring Green, Wisconsin High Museum of Art, Atlanta, Georgia, 1983

  Using these methods, forms are described by points lying on the surfaces, then connected by means of triangles or curves. The architect Zahah Hadid is a representative example of what can be done with them.

 Vitra Fire Station, Weil Am Rhein, Germany LFone/Landesgatenschau, Weil Am Rhein, Germany, 1999

  A CAD/CAM tool used in the aerospace industry called CATIA was instrumental in the conception and design of Guggenheim Museum in Bilbao, Spain, by Frank Gehry and his team of architects. Another one very similar to this is the Walt Disney Concert Hall.

Guggenheim Museum, Bilbao, Spain Walt Disney Concert Hall, Music Center of Los Angeles County

Twelve photographs

  With all these interactions between mathematics, computer science and architecture, I wonder if one day architects will start using algebraic geometry to describe their buildings or if constraints satisfactions languages and some kind of a formal specification language for architecture will be combined in a DSL (Domain Specific Language) to help architects or even to replace them.

25-02-2004

A critique to OSSTMM (Open-Source Security Testing Methodology Manual)

What is OSSTMM?
 From the website:
The Open Source Security Testing Methodology Manual (OSSTMM) is an open standard methodology for performing security tests. Since it’s inception in January 2001, the OSSTMM has become the most widely used, peer-reviewed, comprehensive security testing methodology in existence. While other methodologies and best practices attack security testing from a 50,000 foot view, the OSSTMM focuses on the technical details of exactly which items need to be tested, what to do during a security test, and when different types of security tests should be performed. The OSSTMM provides testing methodologies for the following six security areas: Information Security, Process Security, Internet Technology Security, Communications Security, Wireless Security, and Physical Security.

ISECOM has created three certifications directly related to OSSTMM:

This mean, as I will show here, that ISECOM is in the security business, just for money, and the claims that the OSSTMM is the “most widely used, peer-reviewed, comprehensive security testing methodology in existence” are just marketing hype.

Does the peer-reviewing process add any value to the methodology?

 The plain answer: no, it doesn’t. A long answer requires a careful reading and understanding of the security testing process, to pinpoint some problems in this methodology. Some pressing issues are:

- The methodology is incomplete, an incredible inconvenience given that it began more than 3 years ago. Does it really take 3 years to write 128 pages? I doubt it. It doesn’t contain any knowledge that can’t be found elsewhere, any magic recipe or special insight.
- Life cycles, when given, are just too long. There is no explanation on how they were inferred, they may be overcharging costumers.
- There are no attack trees, and the one that can be deduced from the methodology does not match the actions of an intruder: in some instances, it can be very helpful to simulate the stealthiness of the typical or sophisticated attacker, to better understand where do the intrusion detection systems fail. Thus, in the first place, the sophisticated attacker will try to evade the intrusion detection, but this methodology is too noisy to pass unnoticed.
- The conventional wisdom is that denial of service attacks cannot be neither stopped nor prevented, but for the real expert in the computer security field, denial of service attacks are no source of despair (see my presentation on the different techniques to deal with distributed denial of service attacks). In the OSSTMM, denial of service attacks are shown to exist, with no countermeasure against the symptoms. Is the peer-reviewing process working here?
- The methodology suffers from technology hypes (a strong indication of  the bad IT professional): why do they talk about RFID issues when the methodology isn’t finished? Who uses RFID? How many RFID early-adopters have been hacked?
- Cryptography is an obscure area in the methodology: there is no detection of snake-oil products and no mention to the incipient need of public key infrastructures. From the text, I infer that these so-called “experts” assume that cryptography are just some simple methods of authentication.
- A central point to a security methodology is to guarantee that anyone certified by the process has achieved some level of security. But since there are no standard techniques to test for the reconstruction of information from electromagnetic emissions, what would be the results of these tests? What are the minimum recommended levels of electromagnetic emissions aside NIST and FCC recommendations?
- Code inspection, cross-site scripting or searching for buffer overflows in proprietary code do not form part of OSSTMM.

In conclusion, a professional security penetration of any kind is not an enumeration of the services and their versions, as many seen to think: security tests are not inventories of the electronic infrastructure. Sure, they must inspect the infrastructure from the system level to the last line of code, but to understand its complexities and derive proactive protections, usually involving complex cryptographic schemes: this is the main reason computer security is so hard.

23-02-2004

Symmetry in Music

Music for Strings, Percussion and Celesta
Tonal reflection in the concluding statement of the first movement of Bartok's Music for Strings, Percussion and Celesta.
Fifth String Quartet
Excerpts from Bartok's Fifth String Quartetshowing their axes of tonal reflection.
Fourth String Quartet
Temporally displaced inversion, showing tonal reflection, from Bartok's Fourth String Quartet.
Tonally reflective harmonic progressions with single and multiple axes
Tonally reflective harmonic progressions with single and multiple axes.
Pathetique
Tonal reflection in the key structure of "parallel" movements of Beethoven's Pathetique sonata. All taken from "Symmetry as a Compositional Determinant", by Larry J. Solomon.

Knuth's "Generating all partitions"

    Shortly after the world learns that Archimedes may be the father of combinatorics via an article on NYT promoting the work of Dr. Reviel Netz that updated previous knowledge on the history of combinatorics (see Norman Biggs’s "The roots of combinatorics"), Donald E. Knuth, Professor Emeritus of The Art of Computer Programming at Stanford University, puts online the Pre-Fascicle 3b, “Generating all partitions”, of Volume 4 of The Art of Computing Programming.     I’ve read the previous volumes of “The Art of Computer Programming”, and this last pre-fascicle maintains the superb standards of Knuth’s style: a promethean quest for perfection; a well-integrated genetic viewpoint of the subject, with references extracted from almost every scholastic journal, even the most obscure one; and carefully conceived and graded exercises, to ease the understanding. If imitation is the most sincere form of flattery, then Knuth’s panache must be among the most sincerely flattered flairs of succeeding times.

14-02-2004

Humans against undecidability?

    A relation R is decidable if there is an effective procedure that, when given any x, it replies “YES” if x∈ R and “NO” if x∉R. Thus, the relation R is decidable if and only if the characteristic function of R is effectively computable. That is, the informal concept of a decidable relation is equaled to the formal concept of a recursive relation by the Church-Turing thesis.     The first unsolvable decision problem was provided by Alonzo Church: to determine if a formula in the λ-calculus has a normal form. Since then, a great amount of problems have been proved undecidable, shaping the nature of today’s applied computer science:

  • Halting problem, that is, to determine if a given Turing machine will eventually halt, and some of its subresults:
    • Equivalence problem (given two programs, do they compute the same function?).
    • Dead-code problem (given a block of code in a program, will that code ever be executed).
    • Find the shortest program that computes a function.
    • Membership problem in Turing machines.
    • Rice’s theorem:for any non-trivial property of partial functions, the question of whether a given algorithm computes a partial function with this property is undecidable.
  • Hilbert’s Tenth problem: an algorithm for testing polynomial equations with integer coefficients P(x1, …,xn)=0 for possession of a solution in integers.
  • First Order Logic’s undecidability: it is undecidable whether a first order logic formula is provable (or true under all possible interpretations).
  • Arithmetic and predicate calculus’s undecidability.
  • Post's Correspondence Problem
  • Alias aliasing
  • Context-sensitive data-independence analysis
  • Recognition for DCG
  • Acceptance in regular languages
  • Ambiguity of context-free grammars
    Since determining if a program is a virus is clearly undecidable, an antivirus industry that moves millions of dollars has emerged, always running behind the virus creators.     Given that it is undecidable to check if a program halts, or any non-trivial property about its execution, programmers fight against software bugs every day, while computer scientists try to perfectionate formal methods. Likewise, programmers test themselves trying to write minimal programs such as shellcodes, demos or programs for embedded platforms, another undecidable problem and thus, highly non-trivial.     With the ever-growing trend of technological automatization, I wonder how many jobs will be influenced by undecidability results in the future.

12-02-2004

The demise of stack based buffer overflow exploits?

    The most common form of buffer overflow exploits are stack based buffer overflow exploits. When one is found, programmers are to blame. Actually, they’re the result of a combination of calamitous circumstances:

  • processors without enough bits to properly mark pages
  • too permissive and naive programming languages
  • compilers of insecure languages generating unhardened code
  • operating systems without countermeasures
  • and, finally, the ignorant programmer
    Through the ’90, a number of countermeasures emerged (no rocket-science here):    Some of the previous countermeasures have resulted in some hardened operating systems:    Some argue that all the previous protections deplete resources and can easily be bypassed("Defeating the Stack Based Buffer Overflow Prevention Mechanism of Microsoft Windows 2003 Server", "Bypassing StackGuard and StackShield"), but the real goal is to prevent the exploitation with cut&paste shellcodes and make it harder to exploit stack based buffer overflows.     Looking to the future, Windows 2003 uses the hardware based protection of the latest processors(AMD's Execution Protection and Intel Itanium), which feature more bits to properly mark pages as non-executable. This and the .NET platform will deter hackers to exploit stack based buffer overflows but, without quantitative data of the number and percentage of these exploits over time, which I’ve been unable to find and I'm almost sure that it doesn't exist, nothing will be learned from the impact of these security technologies.

10-02-2004

Project Halo: Towards a Digital Aristotle

     Project Halo in a multi-staged effort towards a digital Aristotle, an application that will encompass much of the world's scientific knowledge and be capable of answering novel questions and advanced problem solving. Vulcan sees two primary functions for the digital Aristotle: first, as a tutor capable of instructing students in the sciences; and second, as a research assistant with broad interdisciplinary skills to help scientists in their work. Phase I of project Halo demonstrated that the current state of the art in knowledge representation and reasoning is capable of producing question answering technology capable of answering novel questions and providing domain-appropriate justifications in AP chemistry. The project demonstrated two drawbacks of the current state of the art:

  • knowledge formulation requires highly specialized and expensive personnel, pushing the per page cost of robust knowledge formulation to about $10,000 per page
  • most of the system failures were attributed to the fact that this specialized personnel lacked sufficient domain knowledge (in chemistry).
     Phase II will address these two issues head on by developing technology that will enable domain experts to formulate knowledge with ever increasing independence from knowledge engineers and be able to pose questions and problems to these systems. Vulcan believes that success in this objective will reduce the cost of knowledge formulation to levels comparable to textbook development and encourage scientists and educators to build an ever-expanding body of machine-processable knowledge that will form the basis for the digital Aristotle. The 30-month Phase II effort will involve teams with world class skills and technology in five primary areas: knowledge representation and reasoning (KRR), knowledge acquisition (KA), intelligent interfaces including natural language understanding, usability and system integration. Comprehensive user-driven evaluations will provide guidance to and validation of our research.      Description taken from a 85-minute length video presentation of Project Halo by Noah Friedland, a research project of Paul Allen’s Vulcan Inc. Another video presentation, 58-minutes length.      This project has brought memories back: when I was a 11-years old kid, I programmed a rule-based system to help me in the chemical identification process of compounds and minerals. In the Halo Pilot, three teams encoded 50 pages out of a chemistry syllabus to answer an exam of 100 questions, graded by three separate chemistry professors. Surprisingly, the three teams performed remarkably well, and they have generated a vast amount of information, including copies of the systems used to answer the questions.      The participants are leaders in field of knowledge-based representation and reasoning(descriptions from Vulcan’s final report):
  • Cycorp has over 600 person-years invested in the development of the world’s largest knowledge base, featuring over a million entities and relationships and tens of thousands of axioms organized into several thousand microtheories. These constructs are tied together by an “upper ontology.” Cycorp tries to leverage its knowledge base extensively when constructing new knowledge. Its reasoning engine has over 500 computational modules and claims to be fundamentally non-monotonic in its reasoning approach. It employs a truth maintenance system (TMS) to verify that new knowledge does not corrupt existing knowledge. Its formal language, CycL is an extremely expressive logic-based representation.
  • Ontoprise’s formal language is F-Logic. F-logic (“F” stands for “Frames”) combines the advantages of frame-based languages and the expressiveness, compact syntax, and well-defined semantics of logic programming languages such as Prolog. The original features of F-logic include signatures, object identity, complex objects, methods, classes, inheritance and rules. Their inference engine OntoBroker® provides means for efficient reasoning in F-Logic through a mixture of forward and backward chaining, based on a dynamic filtering algorithm to compute (the smallest possible) subset of the model for answering the query. The semantics for a set of F-Logic statements is then defined by a transformation process of F-Logic into normal logic (Horn logic with negation) and the well-founded semantics for the resulting set of facts and rules and axioms in normal logic. Their engine does not currently provide any framework for knowledge reuse, rather each knowledge base is constructed from scratch and customized for the defined requirements of the given project or problem at hand.
  • SRI’s approach to knowledge formulation relies on a component library consisting of several hundred concepts, which can be combined into more complex knowledge constructs. Their approach relies on the assumption that these fundamental building blocks can be easily extended and specialized for each new domain. SHAKEN, their KRR environment, features the KM formal language, an expressive frame-based representation. The engine supports monotonic reasoning only, with heuristics for handling identity, and does not, to date, employ a TMS. It does feature an automated entity classification capability.
Interesting documents: Some notes:
  • Cycorp featured the maximum running time because their database its two order of magnitude bigger: maybe they also use some exponential running time algorithm. And generated answers of multiple pages.
  • SRI was the winner of the "contest": they used professional chemists to encode the knowledge.
  • Ontoprise has the best language to encode the information.
  • The contest’s aim was to attract media attention and funding, because breaking one of the following requirements would have render the experiment totally unsuccessful, and the real breakthrough would be to ignore of one of these requisites:
    • Chemistry has a highly formalized body of knowledge, compared to other disciplines like philosophy, psychology and other social sciences.
    • Most questions were of “multiple-choice questions”, that is, answers were given: resolution by elimination is the best strategy. There were not real chemistry questions involving an scenario with some parameters and a goal.
    • No use of meta-reasoning capabilities, just instance-based solutions to prove generalized comprehension-oriented questions.
    • There were not really hard questions like:
      • Enumerative questions, that will need enumerative algebraic gemeotry.
      • Any NP-related problem. See A compendium of NP optimization problems
      • Logical inference that will need to use really big ordered binary decision diagrams.
      • Questions dealing with the quantum mechanically behaviour of the system, that is, calculations exponential in the number of atoms.
      • Questions dealing with uncomputability in physics.
    • Questions were encoded by humans, there was no natural language processing.
    • Experts in knowledge representation and even professional chemists worked in the project.
    Although somewhat impressive, experts in the field already knew that the systems will score high.     Finally, let’s suppose some entity waste some billion dollars to encode the knowledge of undergraduate physics, chemistry, biology and medicine using a somewhat improved version of SRI’s code with Ontoprise’s modelling language– note that encoding mathematics would be a real waste of time-. The biggest advantage here will be in cross-disciplinary answers to questions, because any expert in a particular field will answer more precisely field-related questions. Unfortunately, the experiment didn’t test this scenario, so analogies with Aristotle are out-of-place.

09-02-2004

The Trestle Electromagnetic Pulse Simulator

trestle.jpg
A B-52 bomber sits atop the TRESTLE electromagnetic pulse (EMP) simulator at Kirtland Air Force Base, New Mexico The facility is the largest wood-and-glue laminated structure in the world. Aircraft tested here are subjected to up to 10 million volts of electricity to simulate the effects of a nuclear explosion and assess the "hardness" of electrical and electronic equipment to the EMP pulse generated by a nuclear burst. Taken from U.S. Nuclear Weapons Cost Study Project. What are Electromagnetic Pulses? One of the effects of a nuclear detonation is a nuclear electromagnetic pulse(EMP), an invisible field able to destroy every unshielded electronic device. Enrico Fermi calculated the various parameters of the field before the first nuclear explosion but without much success. The gamma rays, travelling radially from the center of the detonation, produce positive ions and Compton-recoil electrons that escape faster than the remaining ions, creating a strong electric field. Depending on the altitude of the explosion, the effect will have different consequences: for example, it is widely known that a nuclear explosion over the center of the US at 400-500 km could destroy any electronic and electrical device, including power lines and communication networks. And it is possible to produce the same effect without nuclear explosions, a fact that has already been used by the US Army to produce armament. Trestle was one of the first efforts to test equipment and devise protections against EMP, like metallic shielding or tailored hardening of the most susceptible circuits.
A satellite view of TRESTLE
A satellite view of the facility

On the Sapir-Whorf hypothesis and its relation to programming languages

What is the Sapir-Whorf hypothesis?      Named after the American linguists Edward Sapir and Benjamin Lee Whorf, the hypothesis can be broken down in two principles: linguistic determinism and linguistic relativity.      The linguistic determinism principle asserts that language determines the way we think, that is, the grammar, semantics or lexicon in a natural language shape thought. This principle is divided in two groups: strong determinism and weak determinism. In the first case, language and thought are equaled, a variant that almost nobody advocates. However, weak determinism, the well-regarded version of the principle, states that thought may be influenced by language.      In the linguistic relativity principle, those who speak different languages perceive and think about the world differently, with particular meanings to words in a language.      Other supporters of the hypothesis are the behaviourists and the Neo-Classical literary theorists. To what extend does it hold?      Followers of Noam Chomsky, promoting an Universal Grammar and innate linguistic universals, rejected the Sapir-Whorf hypothesis just because it was the other side of the pendulum. A good book reflecting these critics is The Language Instinct, by Steve Pinker.      The strong Sapir-Whorf hypothesis does not even stand against some of its consequences: the impossibilities of translations between languages, reformulations in the same language and the verbalization of thought.      Some of the key differences of the weak Sapir-Whorf hypothesis are:

  • the importance of the social context of language, vg, the different uses of language in different social contexts.
  • thought is influenced by language, with no emphasis in the ineluctabilities caused by language.
  • language is also influenced by thought.
     For illustrative purpose of the Sapir-Whorf hypothesis, and assuming the reader’s mother tongue is one from the Indo-European group, a list of some counterintuitive properties is given here: The Tower of Babel, by Pieter Bruegel the Elder
  • In Tzeltal, the general word for EAT is TUN, but it changes depending in what it is eaten: K’UX for beans, LO’ for bananas, WE’ for tortillas and bread, TI’ for meat and chilis, TZ’U’ for sugarcane and UCH’ for corngruel and liquids.
  • In Carrier, the general word for beaver is TSA, but a small beaver is a TSAYAZ; a mid-size beaver is a TSATUL; a large beaver is a TSATSUL; a young beaver is a TSACHENISBOO’; an adult male beaver is a TSATA’; a female beaver is a TSA’AT; a mother beaver is a TSADIYA; the foreman beaver is a TSACHO; and the list goes on…
  • In Mohawk, the word KA’NIKONRIIO, righteousness, is also used as a word for law, beautiful or good.
  • In Mandarin, it is optional to indicate when an event occurred, and couldn’t be included in the verb. And position in the sentence governs interpretation.
  • In most Southeast Asian languages, the meaning of a sentence is highly context-dependent.
  • In Russian, the verb codifies tense, gender and quantity. Also, there are masculine days of the week (Monday, Tuesday, and Thursday) and feminine days of the week (Wednesday, Friday, and Saturday).
  • In Turkish, the verb specify whether the event was witnessed or if it is a rumour.
  • Riau Indonesian is an extreme case with no articles, neither inflection or tone, hardly no tense marking and a third person pronoun that is neutral to both gender and number.
  • In English, there is no grammatical gender system, unlike most languages.
  • In Qechua, there’s no gender and no articles, verbs are totally regular and there are articles to express personal knowledge (-m) or hearsay knowledge (-s), or to tag actions as futile (-ri-), important (-ru-), lamentable (-lla-), unusual (-yku-), performed for someone else’s benefit (-pa-) or for personal benefit (-ku-).
  • In German, “a young lady has no sex, while a turnip has… a tree is male, its buds are female, its leaves are neuter; horses are sexless, dogs are male, cats are female… tomcats included“, Mark Twain, “A Tramp Abroad”.
  • In Spanish, there are two words for corner, “esquina” and “rincón”, depending on whether the speaker is on the big or small angle of the two intersecting lines.
  • The name for the sun is feminine in German(Sonnenschein), masculine in Spanish (sol), and neuter in Russian. The name for the moon is feminine in Spanish(luna) and Russian, but masculine in German(Mond).
  • The word for key in German(Türschlüssel) is masculine, so German speakers think of them as hard, heavy, jagged, metal, serrated and useful; but key is feminine in Spanish(llave), so Spanish speakers describe them as golden, intricate, little, lovely, shiny and tiny. Also, the word for bridge in German(Brückle) is feminine, so German speakers describe bridges as beautiful, elegant, fragile, peaceful, pretty and slender; but bridges are masculine in Spanish(puente), so they are big, dangerous, long, strong, sturdy and towering.
Programming languages: how many? how different, how similar?      There are more than 2500 programming languages according to The Language List. A useful way to display them and their history is with the timeline of the Computer Languages History:
1954198019952003
Preview page 1Preview page 2Preview page 3Preview page 4Preview page 5Preview page 6Preview page 7Preview page 8Preview page 9
     However, they can be divided in three groups, according to their semantics:
  • Imperative/object oriented: designed around the von Neumann architecture, in which the CPU is separate from the memory and data and programs are stored in the same memory. Therefore, instructions and data are carried from memory to the CPU, and the results are moved back to memory. Thus, the main feature of imperative languages are variables –memory-, with assignment statements –pipe from/to memory- and the iterative form of repetition, with instructions stored in adjacent memory cells, discouraging the use of recursion for repetition. Examples are Fortran, C, Basic, Ada, Smalltalk, C++, Java, Eiffel, …
  • Functional: a more mathematical form of programming, the functional paradigm is based on the application of functions to parameters, without variables, assignments and iteration. Therefore, repetition must be done by recursion, there is no state in the sense of operational and denotational semantics and there are no side effects. Examples are LISP, Common Lisp, Caml, Scheme, ML, Haskell, …
  • Logic: syntax and semantics are a total departure from the imperative and functional paradigm. Declarative, the code are specifications of the desired results, a form of symbolic logic on which logical inferencing algorithms are applied. Examples are Prolog, Parlog, Tempo, Mozart, Mercury, Gödel, …
     A worth-reading book comparing them is Comparative Programming Languages, by L.B. Wilson and R.G. Clark. But does the Sapir-Whorf hypothesis apply to programming languages?      Yes, it does. And what is more, they are a pure example of the strong Sapir-Whorf hypothesis. Some computer scientists argue that because most programming languages are Turing-complete, they can express the same computation. This is partly true, an a close examination of the consequences of the Sapir-Whorf hypothesis will show why:
  • impossibilities of translations between languages: some high level language constructs cannot be expressed in other languages, like concurrency, process communication and synchronization, continuations, real-time keywords, etc… To use them, they will need external libraries programmed in low-level languages, which are not part of the original language.
  • impossibilities of reformulations in the same language: as with translations between languages, the same keywords will not have equivalent ways to express the same computation.
  • verbalization of thought: programming languages are tools used to create programs, and as tools, their limitations bound the programmer expressability and creativity.
Open Problems      A big challenge for computer science is to obtain a unified theory of programming, as discussed by Sir Tony Hoare in Unification of Theories; a Challenge for Computing Science.