$Id$

                Interfacing with Tor: Clients and Controllers
   Copyright 2005 Nick Mathewson -- see LICENSE for licensing information


               WARNING THIS DOCUMENT WILL LEAD YOU ASTRAY.
            IT IS OLD AND CURSED WITH BITROT AND EVIL SPIRITS.
              It is preserved for historical interest only.   

          
            See instead ../README-Java.txt and ../python/README
             

0. About this document

   This document has instructions for writing programs to interface with
   Tor.  You should read it if you want to write a Tor controller, or if you
   want to make your programs work with Tor correctly.

0.1. Further reading

   You should probably have a good idea first of what Tor does and how it
   works; see the main Tor documentation for more detail.

   If you want full specifications for the data formats and protocols Tor
   uses, see tor-spec.txt, control-spec.txt, and socks-extensions.txt, all of
   which are included with the Tor distribution.

1. Writing a controller

   A controller is a program that connects to the Tor client and sends
   it commands.  With a controller, you can examine and change Tor's
   configuration on the fly, change how circuits are built, and perform
   other operations.

   As of the most recent version (0.1.0.11), Tor does not have its controller
   interface enabled by default.  You need to configure it to listen on some
   local port by using the "ControlPort" configuration directive, either in
   the torrc file, like this:

       ControlPort 9100

   Or on the command line, like this:

       tor -controlport 9100

   Then your controller can connect to Tor.  But see the notes on
   authentication below (3.2).

   This document covers the Python and Java interfaces to Tor, and the
   underlying "v1" control protocol introduced in Tor version
   0.1.1.0. Earlier versions used an older and trickier control protocol which
   is not covered here; see "control-spec-v0.txt" for details.

1.1. Getting started

   When you're writing a controller, you can either connect to Tor's control
   port and send it commands directly, or you can use one of the libraries
   we've written to automate this for you.  Right now, there are libraries in
   Java and Python.

   First, you need to load the library and open a new connection to the Tor
   process.  In Java:

     import net.freehaven.tor.control.TorControlConnection;
     import java.net.Socket;

     public class Demo {
       public static final void main(String[] args) {
         Socket s = new Socket("127.0.0.1", 9100);
         TorControlConnection conn = TorControlConnection.getConnection(s);
         conn.authenticate(new byte[0]); // See section 3.2
         // ...
       }
     }

   In Python:

      import socket
      import TorCtl

      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      s.connect(("127.0.0.1", 9100))
      conn = TorCtl.get_connection(s)
      conn.authenticate("")  # See section 3.2
      # ...

   The factory method that you use to create a connection will check whether
   the version of Tor you've connected to supports the newer ("v1")
   text-based control protocol or the older ("v0") binary control protocol.

   Using the v1 protocol, just connect to the control port and say:

      AUTHENTICATE

   (For more information on using the v1 protocol directly, see x.x)

1.2. Configuration and information

   Now that you've got a connection to Tor, what can you do with it?

   One of the easiest operations is manipulating Tor's configuration
   parameters.  You can retrieve or change the value of any configuration
   variable by calling the appropriate method.

   In Java:

       // Get one configuration variable.
       List options = conn.getConf("contact");
       // Get a set of configuration variables.
       List options = conn.getConf(Arrays.asList(new String[]{
              "contact", "orport", "socksport"}));
       // Change a single configuration variable
       conn.setConf("BandwidthRate", "1 MB");
       // Change several configuration variables
       conn.setConf(Arrays.asList(new String[]{
              "HiddenServiceDir /home/tor/service1",
              "HiddenServicePort 80",
       }));
       // Reset some variables to their defaults
       conn.resetConf(Arrays.asList(new String[]{
              "contact", "socksport"
       }));
       // Flush the configuration to disk.
       conn.saveConf();

   In Python:
       # Get one configuration variable
       options = conn.get_option("contact")
       # Get a set of configuration variables.
       options = conn.get_option(["contact", "orport", "socksport"])
       # Change a single configuration variable
       conn.set_option("BandwidthRate", "1 MB")
       # Change several configuration variables
       conn.set_option([
              ("HiddenServiceDir", "/home/tor/service1"),
              ("HiddenServicePort", "80")])
       # Reset some variables to their defaults.
       conn.reset_options(["contact", "socksport"])
       # Flush the configuration to disk.
       conn.save_conf()

   Talking to Tor directly:

       GETCONF contact
       GETCONF contact orport socksport
       SETCONF bandwidthrate="1 MB"
       SETCONF HiddenServiceDir=/home/tor/service1 HiddenServicePort=80
       SAVECONF

   For a list of configuration options recognized by Tor, see the main Tor
   manual page.

1.2.1. Using order-sensitive configuration variables

   In the above example, you'll note that configuration options are returned
   as a list of key-value pairs, and not in the more intuitive map-from-keys-
   to-values form that you might expect.  This is because some of Tor's
   configuration options can appear more than once, and the ordering of these
   options often matters.  The 'Log' option is an example: if more than one
   log is configured, the option will appear more than once.  Sometimes
   options are interrelated: the HiddenServicePort option applies to the
   immediately previous HiddenServiceDir.

   (To retrieve all the hidden service settings, in order, fetch the value of
   the virtual "HiddenServiceOptions" variable.)

   When you are setting these options, you must set them all at once.  For
   example, suppose that there are three logs configured:
      Log debug-debug file /tmp/debug_log
      Log notice-err file /tmp/tor_log
      Log err file /tmp/errors
   If you want to change the third log file option, you need to re-send the other
   two settings, so that Tor knows not to delete them.

1.3. Getting status information

   Tor exposes other status information beyond those set in configuration
   options.  You can access this information with the "getInfo" method.
   In Java:

       // get a single value.
       String version = conn.getInfo("version");
       // get several values
       Map vals = conn.getInfo(Arrays.asList(new String[]{
          "addr-mappings/config", "version"}));

   In Python:

       # Get a single value
       version = conn.get_info("version")
       # Get several values
       vals = conn.get_info(["addr-mappings/config", "version"])

   Using the v1 control interface directly:

       GETINFO version
       GETINFO addr-mappings/config version

   For a complete list of recognized keys, see "control-spec.txt".

1.4. Signals

   You can send named "signals" to the Tor process to have it perform
   certain recognized actions.  For example, the "RELOAD" signal makes Tor
   reload its configuration file.  (If you're used to Unix platforms, this
   has the same effect as sending a HUP to the Tor process.)

   In Java:

       conn.signal("RELOAD");

   In Python:

       conn.signal("RELOAD")

   Using the v1 control protocol:

       SIGNAL RELOAD

   The recognized signal names are:
       "RELOAD" -- Reload configuration information
       "SHUTDOWN" -- Start a clean shutdown of the Tor process
       "DUMP" -- Write current statistics to the logs
       "DEBUG" -- Switch the logs to debugging verbosity
       "HALT" -- Stop the Tor process immediately.

   (See control-spec.txt for an up-to-date list.)

1.5. Listening for events

   Tor can tell you when certain events happen.  To learn about these events,
   first you need to give the control connection an "EventHandler" object to
   receive the events of interest.  Then, you tell the Tor process which
   events it should send you.

   These examples intercept and display log messages.  In Java:

       import net.freehaven.tor.control.NullEventHandler;
       import net.freehaven.tor.control.EventHandler;
       // We extend NullEventHandler so that we don't need to provide empty
       // implementations for all the events we don't care about.
       // ...
       EventHandler eh = new NullEventHandler() {
          public void message(String severity, String msg) {
            System.out.println("["+severity+"] "+msg);
       };
       conn.setEventHandler(eh);
       conn.setEvents(Arrays.asList(new String[]{
          "DEBUG", "INFO", "NOTICE", "WARN", "ERR"}));

   In Python:

       class LogHandler:
           def msg(self, severity, message):
               print "[%s] %s"%(severity, message)
       conn.set_event_handler(LogHandler())
       conn.set_events(["DEBUG", "INFO", "NOTICE", "WARN", "ERR"])

    Using the v1 protocol:  (See x.x for information on parsing the results)
       SETEVENTS DEBUG INFO NOTICE WARN ERR

1.5.1. Kinds of events

    The following event types are currently recognized:

      CIRC: The status of a circuit has changed.
        These events include an ID string to identify the circuit, the
        new status of the circuit, and a list of the the routers in the
        circuit's current path.  The possible status values are:
          LAUNCHED -- the circuit has just been started; no work has been
            done yet to build it.
          EXTENDED -- the circuit has just been extended a single step.
          BUILT -- the circuit is finished.
          FAILED -- the circuit could not be built, and has been abandoned.
          CLOSED -- a successfully built circuit is now closed.

      STREAM: The status of an application stream has changed.
        These events include an string to identity the stream, the
        new status of the stream, the ID of the circuit (if any)
        that the stream is using, and the destination of the stream.
        Recognized status values are:
          NEW -- an application has asked for an anonymous connection
          NEWRESOLVED -- an application has asked for an anonymous hostname
              lookup
          SENTCONNECT -- the stream has been attached to a circuit, and we
              have sent a connection request down the circuit
          SENTRESOLVE -- the stream has been attached to a circuit, and we
              have sent a lookup request down the circuit
          SUCCEEDED -- the stream has been connected, or the lookup request
              has been answered
          FAILED -- the stream failed and cannot be retried
          CLOSED -- the stream closed normally
          DETACHED -- the stream was detached from its circuit, but could be
              reattached to another.

      ORCONN: The status of a connection to an OR has changed.
        These events include a string to identify the OR, and the status of
        the connection.  Current status values are:
          LAUNCHED -- we have started a connection to the OR
          CONNECTED -- we are successfully connected to the OR
          FAILED -- we could not successfully connect to the OR
          CLOSED -- an existing connection to the OR has been closed.

      BW: Amount of bandwidth used in the last second.
        These events include the number of bytes read, and the number of
        bytes written.

      DEBUG, INFO, NOTICE, WARN, ERR: Tor has logged a message.
        These events include the severity of the message, and its textual
        content.

      NEWDESC: A new server descriptor has been received.
        These events include a list of IDs for the servers whose descriptors
        have changed.

      ADDRMAP: Tor has added a new address mapping.
        These events include the address mapped, its new value, and the time
        when the mapping will expire.

   (See control-spec.txt for an up-to-date list.)

1.5.2. Threading issues

    In the Python and Java control libraries, responses from the Tor
    controller are handled in a separate thread of execution.  Ordinarily,
    this thread is a "daemon thread" that exits when your other threads are
    finished.  This could be a problem if you want your main thread to stop,
    and have the rest of your program's functionality handled by events from
    the Tor control interface.  To make the controller thread stay alive when
    your other threads are finished, call the controller's "launch thread"
    method after you create the controller, and before you call the
    authenticate method.

    In Java:

        conn.launchThread(false);  // Not in daemon mode

    In Python:

        conn.launch_thread(daemon=0)  # Not in daemon mode

1.6. Overriding directory functionality

   You can tell Tor about new server descriptors.  (Ordinarily, it learns
   about these from the directory server.)  In Java:

       // Get a descriptor from some source
       String desc = ...;
       // Tell Tor about it
       conn.postDescriptor(desc);

   In Python:

       # Get a descriptor from some source
       desc = ...
       # Tell Tor about it
       conn.post_descriptor(desc)

   With the v1 protocol:

       +POSTDESCRIPTOR
       <the descriptor goes here>
       .

1.7. Mapping addresses

   Sometimes it is desirable to map one address to another, so that a
   connection request to address "A" will result in a connection to address
   B.  For example, suppose you are writing an anonymized DNS resolver.
   While you can already ask Tor to resolve addresses like "tor.eff.org"
   using the SOCKS interface, some special addresses (like
   "6sxoyfb3h2nvok2d.onion" or "tor.eff.org.tor26.exit") don't correspond to
   normal IP addresses.  To get around this, your DNS resolver could ask Tor
   to map unallocated IP addresses to these special hostnames, and then pass
   those IP addresses back to the requesting application.  When the
   application tries to connect to the IP, Tor will redirect the request to
   the correct hostname.

   In Java:

      String onionAddr = "6sxoyfb3h2nvok2d.onion";
      // Make all requests for 127.0.0.100 be rewritten to the chosen addr.
      conn.mapAddress("127.0.0.100", onionAddr);
      // Ask Tor to choose an unallocated IP address to be rewritten to the
      // chosen address.
      String newAddress = conn.mapAddress("0.0.0.0", onionAddr);
      // To remove the mapping for an address, map it to itself
      conn.mapAddress("127.0.0.100", "127.0.0.100");

   In Python:

      onionAddr = "6sxoyfb3h2nvok2d.onion"
      # Make all requests for 127.0.0.100 be rewritten to the chosen addr.
      conn.map_address("127.0.0.100", onionAddr)
      # Ask Tor to choose an unallocated IP address to be rewritten to the
      # chosen address.
      newAddress = conn.map_address("0.0.0.0", onionAddr)
      # To remove the mapping for an address, map it to itself
      conn.map_address("127.0.0.100", "127.0.0.100")

   From the v1 control interface:

      MAPADDRESS 127.0.0.1=6sxoyfb3h2nvok2d.onion
      MAPADDRESS 0.0.0.0=6sxoyfb3h2nvok2d.onion

   Note that you can receive a list of the address mappings set from the
   control interface by requesting the status value "addr-mappings/control".
   See 1.3 above.

1.8. Managing streams and circuits.

   Tor allows controllers to exercise fine control over building circuits,
   attaching streams to circuits, and so on.  (Note that it is possible to
   make Tor pretty nonfunctional by use of these features; act with care.)
   To manipulate a circuit or stream, you will need its ID; you can learn
   about these IDs in one of three ways:

      1. Call a function that creates a new circuit/stream: it will return
         the ID.

      2. Listen for an event that tells you that a circuit or stream's
         status has changed. (See 2.5 above)

      3. Get a list of all circuits and streams by getting the appropriate
         status information values; see control-spec.txt for more
         information.


   Once you have these IDs, you can *extend* a circuit (by adding a new
   Tor server to its path), *attach* a stream to a circuit (causing it to
   exit from the last node in the server's path), *redirect* a stream
   (changing its target address), or *close* a server or stream.

   Note that it is only safe to redirect or attach a stream that is not open:
   that is, one that has not already sent a BEGIN or RESOLVE cell, or one
   which has been detached.

   See the Tor documentation, especially XXXX or XXXX, for more information
   about what streams and circuits are and how they work.

   In Java:

       // Launch a new circuit through the routers moria1 and moria2
       String circID = conn.extendCircuit("0", "moria1,moria2");
       // Extend the circuit through tor26
       conn.extendCircuit(circID, "tor26");
       String streamID = ....; // Learn about a stream somehow.
       // Change its target address
       conn.redirectStream(streamID, "tor.eff.org");
       // Attach it to our circuit
       conn.attachStream(streamID, circID);
       // Close the stream (The byte is the 'reason' for closing it; see
       //       tor-spec.txt)
       conn.closeStream(streamID, 0);
       // Close the circuit ("true" means "only if it has no live streams")
       conn.closeCircuit(circID, true);

   In Python:

       # Launch a new circuit through the routers moria1 and moria2
       circID = conn.extend_circuit("0", ["moria1", "moria2"])
       # Extend the circuit through tor26
       conn.extend_circuit(circID, ["tor26"])
       streamID = ....    #  Learn about a stream somehow.
       # Change its target address
       conn.redirect_stream(streamID, "tor.eff.org")
       # Attach it to our circuit
       conn.attach_stream(streamID, circID)
       # Close the stream
       conn.close_stream(streamID)
       # Close the circuit (IFUNUSED means "only if it has no live streams")
       conn.closeCircuit(circID, flags=["IFUNUSED"])

2. General topics

2.1. Naming servers

   Where the name of a server is called for, it is safest to refer to a
   server by its identity digest.  This is the same as the server's
   fingerprint, with the spaces removed, preceded by a $.  This prevents
   your program from getting confused by multiple servers with the same
   nickname.  (Yes, this is possible.)

   For example, moria1's digest is:
   "$FFCB46DB1339DA84674C70D7CB586434C4370441".

2.2. Authentication and security

   By default, Tor will open control ports on the localhost address,
   127.0.0.1.  This means that only connections from programs on the same
   computer will be allowed.  This isn't very secure, however: it allows any
   program run by any user to give commands to your Tor process.  To prevent
   this, Tor allows you to set a password for authentication.  The best time
   to do this is before Tor is started, so that there won't be a window of
   vulnerability.

   There are two ways to set up authentication: by asking Tor to generate a
   cookie file, or by passing Tor a hashed password.

   If you're on an operating system with good filesystem security (so that
   other users can't read Tor's files), and your controller is running as a
   user that can read Tor's files, pass Tor the "--CookieAuthentication 1"
   option when you start it.  Tor will create a file in its data directory
   called "control_auth_cookie".  All your controller needs to do is to pass
   the contents of this file to authenticate() when it connects to Tor.

   If you'd rather not trust the filesystem, or if Tor is set to run as a
   different user, you can use password security.  You don't need to have
   users pick these passwords; you should have the controller generate them
   randomly when it starts Tor.  Tor doesn't take the password directly; that
   would risk exposure.  Instead, it wants a secure hash of the password in
   its HashedControlPassword option.  You can get one of these hashes by
   running "tor --hash-password", or by calling the provided functions in the
   controller libraries.

   In Java:
      // Create a new random password and its hash.
      PasswordDigest d = PasswordDigest.generateDigest();
      byte[] s = d.getSecret(); // pass this to authenticate
      String h = d.getHashedPassword() // pass this to the Tor on startup.

   In recent versions of Python (with os.urandom):
      secret = os.urandom(32) # pass this to authenticate
      hash = TorCtl.s2k_gen(secret) # pass this to Tor on startup.

3. Getting started with the v1 control protocol

   The "v1" Tor control protocol is line-based: you send Tor lines, each
   ending with a CR LF pair, and Tor replies with a set of lines, each ending
   with a CR LF pair.

   When multi-line data needs to be encoded, it is terminated by a single
   line containing only a period. Lines in that data that start with a period
   have an additional single period added to the front.  When one of the
   commands you send is followed by multi-line data, its name starts with a
   plus (such as +POSTDESCRIPTOR).

   Your controller will need to parse Tor's replies.  Each of these replies
   is also line-based.  Each reply line starts with a three-character status
   code (such as "250" for success), and a single "continuation" character
   ("+", "-", or " ").  The rest of the line is the reply message.

   If the continuation character is " ", this line is the last in the reply.
   If the continuation character is "+", the reply line is followed by
   multi-line data, and more lines.  Otherwise, if the continuation character
   is "-", the reply line is followed by more lines.

   Not every reply line you receive from the controller is in response to an
   immediately preceding control message.  Status codes that start with the
   character "6" are _events_ in response to an earlier SETEVENTS command,
   and are sent asynchronously.

   See control-spec.txt for full documentation.

4. Making a program use Tor

   Suppose you have a simple network application, and you want that
   application to send its traffic over Tor.  This is pretty simple to do:

     - Make sure your protocol is stream based.  If you're using TCP, you're
       fine; if you're using UDP or another non-TCP protocol, Tor can't cope
       right now.

     - Make sure that connections are unidirectional.  That is, make sure
       that your protocol can run with one host (the 'originating host' or
       'client') originating all the connections to the other (the
       'responding host' or 'server').  If the responding host has to open
       TCP connections back to the originating host, it won't be able to do
       so when the originating host is anonymous.

     - For anonymous clients: Get your program to support SOCKS4a or SOCKS5
       with hostnames.  Right now, when your clients open a connection, they
       probably do a two step process of:
         * Resolve the server's hostname to an IP address.
         * Connect to the server.
       Instead, make sure that they can:
         * Connect to a local SOCKS proxy.
         * Tell the SOCKS proxy about the server's hostname and port.
           In SOCKS4a, this is done by sending these bytes, in order:
             0x04                 (socks version)
             0x01                 (connect)
             PORT                 (two bytes, most significant byte first)
             0x00 0x00 0x00 0x01  (fake IP address: tells proxy to use
                                   SOCKS4a)
             0x00                 (empty username field)
             HOSTNAME             (target hostname)
             0x00                 (marks the end of the hostname field)
         * Wait for the SOCKS proxy to connect to the server.
           In SOCKS4a, it will reply with these bytes in order:
             0x00                 (response version)
             STATUS               (0x5A means success; other values mean
                                   failure)
             PORT                 (not set)
             ADDRESS              (not set)

     - For hidden services: Make sure that your program can be configured to
       accept connections from the local host only.

   For more information on SOCKS, see references [1], [2], and [3].  For more
   information on Tor's extensions to the SOCKS protocol, including
   extensions that let you do DNS lookups over SOCKS, see
   "socks-extensions.txt" in the Tor distribution.

4.1. Notes on DNS

   Note that above, we encourage you to use SOCKS4a or SOCKS5 with hostnames
   instead of using SOCKS4 or SOCKS5 with IP addresses.  This is because your
   program needs to make Tor do its hostname lookups anonymously.  If your
   program resolves hostnames on its own (by calling gethostbyname or a
   similar API), then it will effectively broadcast the names of the hosts it
   is about to connect to.

   See http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ#SOCKSAndDNS for
   more details.

4.2. Notes on authentication by IP address

   If your service uses IP addresses to prevent abuse, you should consider
   switching to a different model.  Once your software works with Tor,
   annoying people may begin using Tor to conceal their IP addresses.  If the
   best abuse-prevention scheme you have is IP based, you'll be forced to
   choose between blocking all users who want privacy, and allowing abuse.
   If you've implemented a better authorization scheme, you won't have this
   problem.

4.3. Cleaning your protocol

   You aren't done just because your connections are anonymous.  You need to
   consider whether the application itself is doing things to compromise your
   users' anonymity.  Here are some things to watch out for:

   Information Leaks
     - Does your application include any information about the user
       in the protocol?

     - Does your application include any information about the user's
       computer in the protocol?  This can include not only the computer's IP
       address or MAC address, but also the version of the software, the
       processor type, installed hardware, or any other information that can
       be used to tell users apart.

     - Do different instances of your application behave differently?  If
       there are configuration options that make it easy to tell users apart,
       are they really necessary?

References:
 [1] http://archive.socks.permeo.com/protocol/socks4.protocol
 [2] http://archive.socks.permeo.com/protocol/socks4a.protocol
 [3] SOCKS5: RFC1928