Automating your first localization handoff (Help Center) Follow

Comments

22 comments

  • Avatar
    Mira Yu

    Hi, Charles

    Thanks for this post -  my company has a goal to localize 60+ pages of Help articles into 4+ languages, and we’d like to find out a way to export English text in bulk, translate via SDL WorldServer GMS, and then import translated text in bulk.

    Since I am a Program Manager, not a developer, though, I am unclear how to go about actually using the available Help Center Beta APIs – do we need to construct a technical framework to deploy the APIs on our side, before start using Zendesk APIs ?

    Anyy plans to make any tools and services available for assisting in deployment of web APIs ?

    As with many companies, we have the need, but no dedicated technical resources to actually execute on the designing and deployment of APIs, so I’d appreciate your insight.

    Mira

  • Avatar
    Charles Nadeau

    Hi Mira,

    The APIs are included with your Zendesk, and like Zendesk they live on the web. The APIs for the Help Center are in beta right now and you'll need to sign up to the beta, but you won't have to deploy them on your side.

    The requirements for this article include a command-line interface like the command prompt in Windows or the Terminal in Mac. You also have to install a scripting language and a few libraries to work with the APIs. See [Part 2: Setting up your scripting environment](https://support.zendesk.com/entries/53090153#tlb).

    Try out (or get a technically minded colleague to try out) [Part 3: Creating the initial handoff package](https://support.zendesk.com/entries/53090153#ho) to get a feel for how it works. Copy the script, change the settings as indicated, and run the script from the command line. You can ignore the "How it works" section if it doesn't interest you. The script should export articles in bulk from your Help Center as HTML files.

    If you'd rather have assistance, we offer a paid service. Contact [Zendesk Services](http://www.zendesk.com/services) for options and pricing.

    Charles

     

  • Avatar
    Mira Yu

    Hi, Charles

    Thanks for all the info, I appreciate it !

    Mira

     

  • Avatar
    Jason Grunstra

    What is the best way to go about gathering all of the section ID's. It's not covered in the article anywhere as far as I can tell. 

  • Avatar
    Jason Grunstra

    And... if you have to put the section ID's in the section array, what is the point of the ignore array?? Wouldn't you just leave out the ID's of the sections you don't want it to include? 

  • Avatar
    Xiaochen Nie

    A question about the "publish\_package.py": It seems the api it use only works once.

    The second time we get: Status: 400 Problem with the post request. Exiting.

    What API should we use, if we want to fix some errors in the translated file, and upload it again? 

  • Avatar
    Charles Nadeau

    @Jason

    The **ignore** array should list all the _articles_ within the included sections that you don't want to include in the handoff. If you don't want to ignore any articles, specify an empty array:

    ignore = []

     

    **Getting the section ids**

    If you have less than 15 to 20 sections in the category, you can get the ids manually. Right-click the section title and open it in a new tab. The section id is specified in the URL.

    If you have more than 15-20 sections, you can use a script and the API to get the ids:

    ```
    import requests

    # Set the request parameters
    subdomain = 'your\_subdomain'
    email = 'your\_email\_address'
    pwd = 'your\_password'

    # Specify category id
    category = 200142577 # change this value

    # Do the HTTP get request using the Sections API
    url = 'https://{}.zendesk.com/api/v2/help\_center/categories/{}/sections.json'.format(subdomain, category)
    response = requests.get(url, auth=(email, pwd))

    # Check for HTTP codes other than 200
    if response.status\_code != 200:
        print('Status:', response.status\_code, 'Problem with the request. Exiting.')
        exit()
    data = response.json()

    # Print the section ids
    for section in data['sections']:
        print(section['id'])
    ```
     

  • Avatar
    Charles Nadeau

    @Xiaochen

    I ran into the same 400 errors when trying to publish the articles more than once. If I remember correctly, the cause was trying to _create_ (through a post request) translations that already existed, which the API didn't allow. The solution was to _update_ rather than _create_, which you can do by changing the post request to a put request.

    To change to a put request, change the session method and status code in following lines in the post\_translation() function definition:

    response = session.**post**(url, data=payload)
    if response.status_code !=**201**:

     to

    response = session.**put**(url, data=payload)
    if response.status_code !=**200**:

     

  • Avatar
    Xiaochen Nie

    @Charles

    Thanks for your reply, but it seems the PUT method is not available, we will get 404 in return.

    Maybe because this is a beta version and we are expecting it to official release : )

  • Avatar
    Charles Nadeau

    Sorry, Xiaochen, my mistake. The endpoint for updating is different than the one used for creating. See [Updating translations](http://developer.zendesk.com/documentation/rest_api/hc/translations.html#updating-translations) in the API docs. The endpoint takes a locale value. Also, a 'locale' value isn't needed in the payload.

    I added a new section in the article to cover your use case:

    - [Update the article translations](https://support.zendesk.com/entries/53090153#updates)

    The section contains a new **update\_package.py** script that you can use. Thanks.

    Charles

  • Avatar
    Doug Begle

    Okay, so I've set up the structure and successfully pulled down the articles, but none of the graphics came over.

    Does the following text mean that I have to get the from the S3 cloud myself, or should this script pull them over automatically?

     

    _The images in most articles in the Zendesk documentation are hosted on an  [Amazon S3](http://aws.amazon.com/s3/) file server. This option provides more flexility for managing images. You can add, update, or delete the images on the file server using a file client such as  [Cyberduck](http://cyberduck.io/)._

  • Avatar
    Charles Nadeau

    Hey Doug, the scripts don't bring down the images from the Amazon server. Sorry it's not clear in the article. Our images are already local, on the writers' drives in a couple of shared folders sync'ed to Box. We hand those off to loc directly -- no need to download. When it comes time to publish, we upload copies of the files to the server.

  • Avatar
    Doug Begle

    Hi Charles,

    Um, okay.  So how do I pull down my images since I don't have them local?

     

    thanks,

    Doug

  • Avatar
    Charles Nadeau

    We don't have a script for that, but it would involve getting all the img tags in the downloaded files, and then making HTTP requests to download the images like any other resource. I'd use Beautiful Soup to get the image urls:

    ```
    tree = BeautifulSoup('html\_source')
    images = tree.find\_all('img')
    ```

    Then I'd grab the src attribute in each img tag and use it to make a request for the image file from the server using the Requests module:

    ```
    for image in images:
       src = image['src']
    if src[:4] != 'http': continue
        response = session.get(src, stream=True)
    ```

    Note: I'm checking to make sure the first 4 characters start with http so it's a valid request url.

    At this point, this image is in memory on your system. Next, I'd grab the filename from the src attribute and write it to file:

    ```
    file_name = src.split('/')[-1]
    with open(os.path.join(file\_path, file\_name), mode='wb') as f:
        for chunk in response.iter\_content():
        f.write(chunk)
    ```

    One thing to be careful about: Most web servers only allow browsers to download images. So I'd set a header so my request looked like it's coming from a browser:

    ```
    session = requests.Session()
    session.auth = ('your\_email', 'your\_pwd')
    session.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0'}
    ```

    Hope this helps.

  • Avatar
    Doug Begle

    I'm circling back to this topic now that I have some breathing room. When I run the create_package.py it retrieves the articles but throws an error when it tries to write:

    Traceback (most recent call last):
    File "create_package.py", line 46, in
    with open(os.path.join(file_path, filename), mode='w', encoding='utf-8') as f:
    FileNotFoundError: [Errno 2] No such file or directory: '..\\handoff\\current\\203514393.html'

    Any idea what's going on? I can shoot you my create_package.py if that would help.

    thanks,

    Doug

  • Avatar
    Tzvi Gordon

    Hi,

    In a straightforward Zendesk, it works great!

    However, on a different, more complex ZenDesk Help Center, I get error:
    Traceback (most recent call last):
    File "create_package.py", line 38, in
    body = BeautifulSoup(data['translation']['body'])
    KeyError: 'translation'

    I know for a fact that the category is set for translation, as is the section I want to create for translation.

    What am I missing?

    Thanks,
    Gordon

  • Avatar
    Neal Timpe (Edited )

    Hi Charles, thanks for posting this. It's working pretty well so far.  I'm having an utf-8 encoding issue with the create_package.py script.

    I'm finding  symbols in weird places in the HTML output after I run the script.

    I searched stackoverflow for some information.

    http://stackoverflow.com/questions/7219361/python-and-beautifulsoup-encoding-issues

    When I inspect an article's JSON, in the metadata, it says is encoded as UTF-8. This stackoverflow article says that sometimes BeautifulSoup can get confused and encode things wrong.

    One of the stackoverflow articles says to add a UTF-8 ignore line. I've experimented a bit, but I don't know enough about either soup or python to figure it out. Any ideas?

    soup = BeautifulSoup.BeautifulSoup(content.decode('utf-8','ignore'))

    Am I missing something?

    Thanks!

  • Avatar
    Charles Nadeau

    Hi Neal,

    Encoding problems are a pain. Let's start with the basics. What application are you using to view the problem HTML? Sometime an app itself has trouble interpreting encoded characters. Have you tried opening the file in a different text editor? TextWrangler is a great free text editor for the Mac - http://www.barebones.com/products/TextWrangler/.

  • Avatar
    Neal Timpe

    Hi Charles,

    There aren't any encoding issues when I view it in a Textwrangler or Atom. They only show up when I view it in Firefox. And when I selected an encoding by clicking View>Text Encoding>Unicode, the issue was fixed. More importantly, it's causing problems when I send it to my translation service. When I inspect the HTML, I notice there's an empty head element when I run the script as is. I'm not sure if that's the normal result, but that's the result I get.

    I got halfway to a solution on my own. I added the head variable.

            tree = BeautifulSoup('<html></html>')
            head = BeautifulSoup('<head><meta charset="utf-8"></head>')
            body = BeautifulSoup(data['translation']['body'])
            title = tree.new_tag('h1')
            title.string = data['translation']['title']

            tree.html.append(title)
            tree.html.append(head)
            tree.html.append(body)

    I know this is all wrong and it adds the meta tag in the wrong place, but even with the meta element inserted after the h1, it cleared up the encoding problem in Firefox. I'm going to experiment some more to figure out how to stick the meta in the head element. My working hypothesis right now is that there isn't an encoding issue, but Firefox/Chrome are guessing the wrong encoding. I'm going to add an encoding hint for the browsers and translation tools and see if that works.

    Thanks for getting back to me so quickly!

  • Avatar
    Neal Timpe

    Update: I have a solution for my encoding issue and I want to share it. My hypothesis was that FF/chrome was guessing the encoding wrong, so I added a meta tag element to tell the browser what encoding to use. That seemed to work. My last post put the meta tag in the wrong place. I learned a little more and figured out how to put it in the right place. :-)

    I also added a DOCTYPE statement that identifies the HTML as HTML5, a title element that automatically includes the title from each article, and an en-us language attribute. It took me a long time to figure out that I couldn't stick the DOCTYPE statement in with the other append functions and that I had to stick it in as a string in the write function before it wrote the payload to the file.

            tree = BeautifulSoup('<html></html>')
            meta = BeautifulSoup('<meta charset="utf-8">')
            body = BeautifulSoup(data['translation']['body'])
            title = tree.new_tag('h1')
            title.string = data['translation']['title']
            titlehead = tree.new_tag('title')
            titlehead.string = data['translation']['title']
            declare = '<!DOCTYPE html>'

            tree.html.append(meta)
            tree.html.append(titlehead)
            tree.html.append(title)
            tree.html.append(body)
            tree.find ('html')['lang'] = 'en-us'

            filename = '{}.html'.format(article)
            with open(os.path.join(file_path, filename), mode='w', encoding='utf-8') as f:
                f.write(declare + tree.prettify())
            print('- Saved {}'.format(filename))

    If you use this code, you'll need to remove the title element from the HTML before you import the translated content into Zendesk. So, I included the code I modified for publish_package.py and update_package.py. I added tree.title.decompose() at the end to remove the title element and its contents.

        tree = BeautifulSoup(html_source)
        title = tree.h1.string.strip()
        tree.h1.decompose()
        tree.title.decompose()

    Thanks for sharing this code and this article. I'm kind of a novice at this but your explanation really helped me understand what was going on in the code enough to modify it.

  • Avatar
    Charles Nadeau (Edited )

    This is great information, Neal. Thanks for sharing.

  • Avatar
    Tami Settergren

    Charles,

    Thanks so much for posting these scripts and explaining them so well. We'll probably localize our KB into eight languages, which would have been a copy-paste headache. This is the automated solution I was hoping to find, and probably wouldn't have come up with myself (at least not for a long time!) since I'm not a programmer.

Please sign in to leave a comment.

Powered by Zendesk