$Id$ Interfacing with Tor: Clients and Controllers Copyright 2005 Nick Mathewson -- see LICENSE for licensing information WARNING THIS DOCUMENT WILL LEAD YOU ASTRAY. IT IS OLD AND CURSED WITH BITROT AND EVIL SPIRITS. It is preserved for historical interest only. See instead ../README-Java.txt and ../python/README 0. About this document This document has instructions for writing programs to interface with Tor. You should read it if you want to write a Tor controller, or if you want to make your programs work with Tor correctly. 0.1. Further reading You should probably have a good idea first of what Tor does and how it works; see the main Tor documentation for more detail. If you want full specifications for the data formats and protocols Tor uses, see tor-spec.txt, control-spec.txt, and socks-extensions.txt, all of which are included with the Tor distribution. 1. Writing a controller A controller is a program that connects to the Tor client and sends it commands. With a controller, you can examine and change Tor's configuration on the fly, change how circuits are built, and perform other operations. As of the most recent version (0.1.0.11), Tor does not have its controller interface enabled by default. You need to configure it to listen on some local port by using the "ControlPort" configuration directive, either in the torrc file, like this: ControlPort 9100 Or on the command line, like this: tor -controlport 9100 Then your controller can connect to Tor. But see the notes on authentication below (3.2). This document covers the Python and Java interfaces to Tor, and the underlying "v1" control protocol introduced in Tor version 0.1.1.0. Earlier versions used an older and trickier control protocol which is not covered here; see "control-spec-v0.txt" for details. 1.1. Getting started When you're writing a controller, you can either connect to Tor's control port and send it commands directly, or you can use one of the libraries we've written to automate this for you. Right now, there are libraries in Java and Python. First, you need to load the library and open a new connection to the Tor process. In Java: import net.freehaven.tor.control.TorControlConnection; import java.net.Socket; public class Demo { public static final void main(String[] args) { Socket s = new Socket("127.0.0.1", 9100); TorControlConnection conn = TorControlConnection.getConnection(s); conn.authenticate(new byte[0]); // See section 3.2 // ... } } In Python: import socket import TorCtl s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(("127.0.0.1", 9100)) conn = TorCtl.get_connection(s) conn.authenticate("") # See section 3.2 # ... The factory method that you use to create a connection will check whether the version of Tor you've connected to supports the newer ("v1") text-based control protocol or the older ("v0") binary control protocol. Using the v1 protocol, just connect to the control port and say: AUTHENTICATE (For more information on using the v1 protocol directly, see x.x) 1.2. Configuration and information Now that you've got a connection to Tor, what can you do with it? One of the easiest operations is manipulating Tor's configuration parameters. You can retrieve or change the value of any configuration variable by calling the appropriate method. In Java: // Get one configuration variable. List options = conn.getConf("contact"); // Get a set of configuration variables. List options = conn.getConf(Arrays.asList(new String[]{ "contact", "orport", "socksport"})); // Change a single configuration variable conn.setConf("BandwidthRate", "1 MB"); // Change several configuration variables conn.setConf(Arrays.asList(new String[]{ "HiddenServiceDir /home/tor/service1", "HiddenServicePort 80", })); // Reset some variables to their defaults conn.resetConf(Arrays.asList(new String[]{ "contact", "socksport" })); // Flush the configuration to disk. conn.saveConf(); In Python: # Get one configuration variable options = conn.get_option("contact") # Get a set of configuration variables. options = conn.get_option(["contact", "orport", "socksport"]) # Change a single configuration variable conn.set_option("BandwidthRate", "1 MB") # Change several configuration variables conn.set_option([ ("HiddenServiceDir", "/home/tor/service1"), ("HiddenServicePort", "80")]) # Reset some variables to their defaults. conn.reset_options(["contact", "socksport"]) # Flush the configuration to disk. conn.save_conf() Talking to Tor directly: GETCONF contact GETCONF contact orport socksport SETCONF bandwidthrate="1 MB" SETCONF HiddenServiceDir=/home/tor/service1 HiddenServicePort=80 SAVECONF For a list of configuration options recognized by Tor, see the main Tor manual page. 1.2.1. Using order-sensitive configuration variables In the above example, you'll note that configuration options are returned as a list of key-value pairs, and not in the more intuitive map-from-keys- to-values form that you might expect. This is because some of Tor's configuration options can appear more than once, and the ordering of these options often matters. The 'Log' option is an example: if more than one log is configured, the option will appear more than once. Sometimes options are interrelated: the HiddenServicePort option applies to the immediately previous HiddenServiceDir. (To retrieve all the hidden service settings, in order, fetch the value of the virtual "HiddenServiceOptions" variable.) When you are setting these options, you must set them all at once. For example, suppose that there are three logs configured: Log debug-debug file /tmp/debug_log Log notice-err file /tmp/tor_log Log err file /tmp/errors If you want to change the third log file option, you need to re-send the other two settings, so that Tor knows not to delete them. 1.3. Getting status information Tor exposes other status information beyond those set in configuration options. You can access this information with the "getInfo" method. In Java: // get a single value. String version = conn.getInfo("version"); // get several values Map vals = conn.getInfo(Arrays.asList(new String[]{ "addr-mappings/config", "version"})); In Python: # Get a single value version = conn.get_info("version") # Get several values vals = conn.get_info(["addr-mappings/config", "version"]) Using the v1 control interface directly: GETINFO version GETINFO addr-mappings/config version For a complete list of recognized keys, see "control-spec.txt". 1.4. Signals You can send named "signals" to the Tor process to have it perform certain recognized actions. For example, the "RELOAD" signal makes Tor reload its configuration file. (If you're used to Unix platforms, this has the same effect as sending a HUP to the Tor process.) In Java: conn.signal("RELOAD"); In Python: conn.signal("RELOAD") Using the v1 control protocol: SIGNAL RELOAD The recognized signal names are: "RELOAD" -- Reload configuration information "SHUTDOWN" -- Start a clean shutdown of the Tor process "DUMP" -- Write current statistics to the logs "DEBUG" -- Switch the logs to debugging verbosity "HALT" -- Stop the Tor process immediately. (See control-spec.txt for an up-to-date list.) 1.5. Listening for events Tor can tell you when certain events happen. To learn about these events, first you need to give the control connection an "EventHandler" object to receive the events of interest. Then, you tell the Tor process which events it should send you. These examples intercept and display log messages. In Java: import net.freehaven.tor.control.NullEventHandler; import net.freehaven.tor.control.EventHandler; // We extend NullEventHandler so that we don't need to provide empty // implementations for all the events we don't care about. // ... EventHandler eh = new NullEventHandler() { public void message(String severity, String msg) { System.out.println("["+severity+"] "+msg); }; conn.setEventHandler(eh); conn.setEvents(Arrays.asList(new String[]{ "DEBUG", "INFO", "NOTICE", "WARN", "ERR"})); In Python: class LogHandler: def msg(self, severity, message): print "[%s] %s"%(severity, message) conn.set_event_handler(LogHandler()) conn.set_events(["DEBUG", "INFO", "NOTICE", "WARN", "ERR"]) Using the v1 protocol: (See x.x for information on parsing the results) SETEVENTS DEBUG INFO NOTICE WARN ERR 1.5.1. Kinds of events The following event types are currently recognized: CIRC: The status of a circuit has changed. These events include an ID string to identify the circuit, the new status of the circuit, and a list of the the routers in the circuit's current path. The possible status values are: LAUNCHED -- the circuit has just been started; no work has been done yet to build it. EXTENDED -- the circuit has just been extended a single step. BUILT -- the circuit is finished. FAILED -- the circuit could not be built, and has been abandoned. CLOSED -- a successfully built circuit is now closed. STREAM: The status of an application stream has changed. These events include an string to identity the stream, the new status of the stream, the ID of the circuit (if any) that the stream is using, and the destination of the stream. Recognized status values are: NEW -- an application has asked for an anonymous connection NEWRESOLVED -- an application has asked for an anonymous hostname lookup SENTCONNECT -- the stream has been attached to a circuit, and we have sent a connection request down the circuit SENTRESOLVE -- the stream has been attached to a circuit, and we have sent a lookup request down the circuit SUCCEEDED -- the stream has been connected, or the lookup request has been answered FAILED -- the stream failed and cannot be retried CLOSED -- the stream closed normally DETACHED -- the stream was detached from its circuit, but could be reattached to another. ORCONN: The status of a connection to an OR has changed. These events include a string to identify the OR, and the status of the connection. Current status values are: LAUNCHED -- we have started a connection to the OR CONNECTED -- we are successfully connected to the OR FAILED -- we could not successfully connect to the OR CLOSED -- an existing connection to the OR has been closed. BW: Amount of bandwidth used in the last second. These events include the number of bytes read, and the number of bytes written. DEBUG, INFO, NOTICE, WARN, ERR: Tor has logged a message. These events include the severity of the message, and its textual content. NEWDESC: A new server descriptor has been received. These events include a list of IDs for the servers whose descriptors have changed. ADDRMAP: Tor has added a new address mapping. These events include the address mapped, its new value, and the time when the mapping will expire. (See control-spec.txt for an up-to-date list.) 1.5.2. Threading issues In the Python and Java control libraries, responses from the Tor controller are handled in a separate thread of execution. Ordinarily, this thread is a "daemon thread" that exits when your other threads are finished. This could be a problem if you want your main thread to stop, and have the rest of your program's functionality handled by events from the Tor control interface. To make the controller thread stay alive when your other threads are finished, call the controller's "launch thread" method after you create the controller, and before you call the authenticate method. In Java: conn.launchThread(false); // Not in daemon mode In Python: conn.launch_thread(daemon=0) # Not in daemon mode 1.6. Overriding directory functionality You can tell Tor about new server descriptors. (Ordinarily, it learns about these from the directory server.) In Java: // Get a descriptor from some source String desc = ...; // Tell Tor about it conn.postDescriptor(desc); In Python: # Get a descriptor from some source desc = ... # Tell Tor about it conn.post_descriptor(desc) With the v1 protocol: +POSTDESCRIPTOR . 1.7. Mapping addresses Sometimes it is desirable to map one address to another, so that a connection request to address "A" will result in a connection to address B. For example, suppose you are writing an anonymized DNS resolver. While you can already ask Tor to resolve addresses like "tor.eff.org" using the SOCKS interface, some special addresses (like "6sxoyfb3h2nvok2d.onion" or "tor.eff.org.tor26.exit") don't correspond to normal IP addresses. To get around this, your DNS resolver could ask Tor to map unallocated IP addresses to these special hostnames, and then pass those IP addresses back to the requesting application. When the application tries to connect to the IP, Tor will redirect the request to the correct hostname. In Java: String onionAddr = "6sxoyfb3h2nvok2d.onion"; // Make all requests for 127.0.0.100 be rewritten to the chosen addr. conn.mapAddress("127.0.0.100", onionAddr); // Ask Tor to choose an unallocated IP address to be rewritten to the // chosen address. String newAddress = conn.mapAddress("0.0.0.0", onionAddr); // To remove the mapping for an address, map it to itself conn.mapAddress("127.0.0.100", "127.0.0.100"); In Python: onionAddr = "6sxoyfb3h2nvok2d.onion" # Make all requests for 127.0.0.100 be rewritten to the chosen addr. conn.map_address("127.0.0.100", onionAddr) # Ask Tor to choose an unallocated IP address to be rewritten to the # chosen address. newAddress = conn.map_address("0.0.0.0", onionAddr) # To remove the mapping for an address, map it to itself conn.map_address("127.0.0.100", "127.0.0.100") From the v1 control interface: MAPADDRESS 127.0.0.1=6sxoyfb3h2nvok2d.onion MAPADDRESS 0.0.0.0=6sxoyfb3h2nvok2d.onion Note that you can receive a list of the address mappings set from the control interface by requesting the status value "addr-mappings/control". See 1.3 above. 1.8. Managing streams and circuits. Tor allows controllers to exercise fine control over building circuits, attaching streams to circuits, and so on. (Note that it is possible to make Tor pretty nonfunctional by use of these features; act with care.) To manipulate a circuit or stream, you will need its ID; you can learn about these IDs in one of three ways: 1. Call a function that creates a new circuit/stream: it will return the ID. 2. Listen for an event that tells you that a circuit or stream's status has changed. (See 2.5 above) 3. Get a list of all circuits and streams by getting the appropriate status information values; see control-spec.txt for more information. Once you have these IDs, you can *extend* a circuit (by adding a new Tor server to its path), *attach* a stream to a circuit (causing it to exit from the last node in the server's path), *redirect* a stream (changing its target address), or *close* a server or stream. Note that it is only safe to redirect or attach a stream that is not open: that is, one that has not already sent a BEGIN or RESOLVE cell, or one which has been detached. See the Tor documentation, especially XXXX or XXXX, for more information about what streams and circuits are and how they work. In Java: // Launch a new circuit through the routers moria1 and moria2 String circID = conn.extendCircuit("0", "moria1,moria2"); // Extend the circuit through tor26 conn.extendCircuit(circID, "tor26"); String streamID = ....; // Learn about a stream somehow. // Change its target address conn.redirectStream(streamID, "tor.eff.org"); // Attach it to our circuit conn.attachStream(streamID, circID); // Close the stream (The byte is the 'reason' for closing it; see // tor-spec.txt) conn.closeStream(streamID, 0); // Close the circuit ("true" means "only if it has no live streams") conn.closeCircuit(circID, true); In Python: # Launch a new circuit through the routers moria1 and moria2 circID = conn.extend_circuit("0", ["moria1", "moria2"]) # Extend the circuit through tor26 conn.extend_circuit(circID, ["tor26"]) streamID = .... # Learn about a stream somehow. # Change its target address conn.redirect_stream(streamID, "tor.eff.org") # Attach it to our circuit conn.attach_stream(streamID, circID) # Close the stream conn.close_stream(streamID) # Close the circuit (IFUNUSED means "only if it has no live streams") conn.closeCircuit(circID, flags=["IFUNUSED"]) 2. General topics 2.1. Naming servers Where the name of a server is called for, it is safest to refer to a server by its identity digest. This is the same as the server's fingerprint, with the spaces removed, preceded by a $. This prevents your program from getting confused by multiple servers with the same nickname. (Yes, this is possible.) For example, moria1's digest is: "$FFCB46DB1339DA84674C70D7CB586434C4370441". 2.2. Authentication and security By default, Tor will open control ports on the localhost address, 127.0.0.1. This means that only connections from programs on the same computer will be allowed. This isn't very secure, however: it allows any program run by any user to give commands to your Tor process. To prevent this, Tor allows you to set a password for authentication. The best time to do this is before Tor is started, so that there won't be a window of vulnerability. There are two ways to set up authentication: by asking Tor to generate a cookie file, or by passing Tor a hashed password. If you're on an operating system with good filesystem security (so that other users can't read Tor's files), and your controller is running as a user that can read Tor's files, pass Tor the "--CookieAuthentication 1" option when you start it. Tor will create a file in its data directory called "control_auth_cookie". All your controller needs to do is to pass the contents of this file to authenticate() when it connects to Tor. If you'd rather not trust the filesystem, or if Tor is set to run as a different user, you can use password security. You don't need to have users pick these passwords; you should have the controller generate them randomly when it starts Tor. Tor doesn't take the password directly; that would risk exposure. Instead, it wants a secure hash of the password in its HashedControlPassword option. You can get one of these hashes by running "tor --hash-password", or by calling the provided functions in the controller libraries. In Java: // Create a new random password and its hash. PasswordDigest d = PasswordDigest.generateDigest(); byte[] s = d.getSecret(); // pass this to authenticate String h = d.getHashedPassword() // pass this to the Tor on startup. In recent versions of Python (with os.urandom): secret = os.urandom(32) # pass this to authenticate hash = TorCtl.s2k_gen(secret) # pass this to Tor on startup. 3. Getting started with the v1 control protocol The "v1" Tor control protocol is line-based: you send Tor lines, each ending with a CR LF pair, and Tor replies with a set of lines, each ending with a CR LF pair. When multi-line data needs to be encoded, it is terminated by a single line containing only a period. Lines in that data that start with a period have an additional single period added to the front. When one of the commands you send is followed by multi-line data, its name starts with a plus (such as +POSTDESCRIPTOR). Your controller will need to parse Tor's replies. Each of these replies is also line-based. Each reply line starts with a three-character status code (such as "250" for success), and a single "continuation" character ("+", "-", or " "). The rest of the line is the reply message. If the continuation character is " ", this line is the last in the reply. If the continuation character is "+", the reply line is followed by multi-line data, and more lines. Otherwise, if the continuation character is "-", the reply line is followed by more lines. Not every reply line you receive from the controller is in response to an immediately preceding control message. Status codes that start with the character "6" are _events_ in response to an earlier SETEVENTS command, and are sent asynchronously. See control-spec.txt for full documentation. 4. Making a program use Tor Suppose you have a simple network application, and you want that application to send its traffic over Tor. This is pretty simple to do: - Make sure your protocol is stream based. If you're using TCP, you're fine; if you're using UDP or another non-TCP protocol, Tor can't cope right now. - Make sure that connections are unidirectional. That is, make sure that your protocol can run with one host (the 'originating host' or 'client') originating all the connections to the other (the 'responding host' or 'server'). If the responding host has to open TCP connections back to the originating host, it won't be able to do so when the originating host is anonymous. - For anonymous clients: Get your program to support SOCKS4a or SOCKS5 with hostnames. Right now, when your clients open a connection, they probably do a two step process of: * Resolve the server's hostname to an IP address. * Connect to the server. Instead, make sure that they can: * Connect to a local SOCKS proxy. * Tell the SOCKS proxy about the server's hostname and port. In SOCKS4a, this is done by sending these bytes, in order: 0x04 (socks version) 0x01 (connect) PORT (two bytes, most significant byte first) 0x00 0x00 0x00 0x01 (fake IP address: tells proxy to use SOCKS4a) 0x00 (empty username field) HOSTNAME (target hostname) 0x00 (marks the end of the hostname field) * Wait for the SOCKS proxy to connect to the server. In SOCKS4a, it will reply with these bytes in order: 0x00 (response version) STATUS (0x5A means success; other values mean failure) PORT (not set) ADDRESS (not set) - For hidden services: Make sure that your program can be configured to accept connections from the local host only. For more information on SOCKS, see references [1], [2], and [3]. For more information on Tor's extensions to the SOCKS protocol, including extensions that let you do DNS lookups over SOCKS, see "socks-extensions.txt" in the Tor distribution. 4.1. Notes on DNS Note that above, we encourage you to use SOCKS4a or SOCKS5 with hostnames instead of using SOCKS4 or SOCKS5 with IP addresses. This is because your program needs to make Tor do its hostname lookups anonymously. If your program resolves hostnames on its own (by calling gethostbyname or a similar API), then it will effectively broadcast the names of the hosts it is about to connect to. See http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ#SOCKSAndDNS for more details. 4.2. Notes on authentication by IP address If your service uses IP addresses to prevent abuse, you should consider switching to a different model. Once your software works with Tor, annoying people may begin using Tor to conceal their IP addresses. If the best abuse-prevention scheme you have is IP based, you'll be forced to choose between blocking all users who want privacy, and allowing abuse. If you've implemented a better authorization scheme, you won't have this problem. 4.3. Cleaning your protocol You aren't done just because your connections are anonymous. You need to consider whether the application itself is doing things to compromise your users' anonymity. Here are some things to watch out for: Information Leaks - Does your application include any information about the user in the protocol? - Does your application include any information about the user's computer in the protocol? This can include not only the computer's IP address or MAC address, but also the version of the software, the processor type, installed hardware, or any other information that can be used to tell users apart. - Do different instances of your application behave differently? If there are configuration options that make it easy to tell users apart, are they really necessary? References: [1] http://archive.socks.permeo.com/protocol/socks4.protocol [2] http://archive.socks.permeo.com/protocol/socks4a.protocol [3] SOCKS5: RFC1928