Zendesk REST API tutorial - Backing up your knowledge base with Python Follow

Comments

47 comments

  • Avatar
    Brent Schaus

    Merci, Charles. This is fantastic :)

  • Avatar
    margery tongway

    Thanks, i'll give this one a go. I've been using a Ruby script for the last year or so, but this one looks like it is better.

    Cheers

  • Avatar
    Roland Pellegrin

    Thanks, this is really helpful.
    Just in case, did anyone write a restore script ? This would be really nice to translate the knowledge base.

  • Avatar
    Adam Goolie Gould

    I'd been thinking about how to "back up" our production knowledgebase to our sandbox.

    Once the backup is complete as per this method, could we restore to our sandbox?

    Also, +1 to @Roland's question about a restore script.

  • Avatar
    Dan Lanir

    This is awesome. Thank you so much

  • Avatar
    Dan Lanir

    From what I can tell (on this page as well as on the API definition page for Knowledge Base), you cannot pull the "last udpated by" value of the KB. Does anybody know a way to do this?

  • Avatar
    Fred Thomas

    @Adam Goolie Gould

    Hey Adam!

    "Technically" you could apply your backup from your production knowledge base to your Sandbox. However, the benefit of doing so may not necessarily be worth the added steps. This is due to the Sandbox environment being completely separate from your production instance of Zendesk. What this means is that, you can more easily restore your backup to your production environment (if/when needed) from presumably the same files that you saved on the machine that you performed the initial backup.

    So, yes you could do this, I just caution against doing so in effort of implementing some sort of "synced" redundancy as that is not the case with the Sandbox environment.

    I hope this information helps!

    Cheers,

    Fred Thomas | Customer Advocate

  • Avatar
    Charles Nadeau (Edited )

    @Roland, you'll find scripts and instructions on restoring html files back from localization here:

    https://help.zendesk.com/hc/en-us/articles/229489108#post

    It's part of a larger article on using the API to automate the first loc handoff.

  • Avatar
    Grow

    This is awesome, but this is CRAZY that there is not a way to backup or restore content inside the app. Wrote about that here - https://support.zendesk.com/hc/en-us/community/posts/209398128-Need-ability-to-recover-or-back-up-Help-Center-content?preview_as_role=manager

  • Avatar
    Russur

    Is there any way, via perhaps the API, to also get the images downloaded as well?

  • Avatar
    Charles Nadeau

    Hi Russur,


    There's no image API, but once you've downloaded the articles on your system, a number of Python libraries and techniques can let you read the image URLs in the files and make requests to download them. I like BeautifulSoup for parsing HTML, and Requests to make HTTP requests. You can do a Google search for other options.


    As for me, I'd write a script that opened each file and used BeautifulSoup to get the image urls:


    tree = BeautifulSoup('html_source')  
    
    images = tree.find_all('img')

    Then I'd grab the src attribute in each img tag and use it to make a request for the image file from the server using the Requests library:


    for image in images:  
    
       src = image['src']
    if src[:4] != 'http': continue
        response = session.get(src, stream=True)

    Note: I'm checking to make sure the first 4 characters start with http so it's a valid request url.


    At this point, this image is in memory on my system. Next, I'd grab the filename from the src attribute and write it to file:


    file_name = src.split('/')[-1]  
    
    with open(os.path.join(file_path, file_name), mode='wb') as f:
        for chunk in response.iter_content():
        f.write(chunk)

    One thing to be careful about: Most web servers only allow browsers to download images. So I'd set a header so my request looked like it's coming from a browser:


    session = requests.Session()  
    
    session.auth = ('your_email', 'your_pwd')
    session.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0'}

    Hope this helps.

  • Avatar
    Russur

    Also, is there any way to get the Section names, that contain the html files?


     


    Thanks

  • Avatar
    Russur

    That did great! Here is my hacked code that i added. (This goes about the line:


            endpoint = data['next_page']


     


    # begin included code to search and pull out images


    tree=BeautifulSoup(article['body'], "html.parser")
    images = tree.find_all('img')


    for image in images:
    src = image['src']
    if src[:4] != 'http': continue
    response = session.get(src, stream=True)


    file_name = src.split('/')[-1]
    image_dir = src.split('/')[-2]
    file_name = str(article['id']) + '_' + image_dir + '_' + file_name
    with open(os.path.join(backup_path, file_name), mode='wb') as f:
    for chunk in response.iter_content():
    f.write(chunk)


    # End of included code


     


     


    Also added the 


     


    from bs4 import BeautifulSoup


     


    towards the top also.


     


    This will work to get the graphic as well as the directory name that Zendesk created for the image. I will probably update this to get the Section ID Name, and maybe recreate the directory structure. Thanks!


     

  • Avatar
    Russur

    Is there a way to get the Title of the Section, that the article is contained in? I was trying to get the Section API call to work, but having no luck?


     

  • Avatar
    Charles Nadeau (Edited )

    Hi Russur,

     

    You could sideload the sections with your articles in the API call (the bit in bold in the following example):

     

    endpoint = zendesk + '/api/v2/help_center/{locale}/articles.json?include=sections'.format(locale=language.lower())

     

    Then you can associate the section_id in each article record with a section record, which will contain the section title.

     

    For more info on how sideloading works, see https://help.zendesk.com/hc/en-us/articles/229489048-Sideloading-related-records

     

    For a tutorial that covers sideloading along the way and a technique to associate records, see Getting large data sets, especially the section on sideloading.

     

     

  • Avatar
    David Conway

    Every time I try to run the make_backup.py python3 script, I keep getting the following:


    Failed to retrieve articles with error 401

  • Avatar
    Charles Nadeau

    Hi David, a 401 points to a problem with the authentication credentials. Can you double-check the Zendesk email and password you entered on line 7 under "Code complete" above?


    The other thing to check is to see if your Zendesk is configured to allow passwords for API requests. In the admin interface, click the Admin button (gear icon) in the lower-left, then Channels > API. At the bottom of the page there should be a checkbox to enable password access.

  • Avatar
    David Conway

    The enable password access checkbox is checked.  My credentials are fine as I can get into Zendesk and manage articles.  API is not liking something.  We use gmail accounts for authentication into zendesk.  Could this be the problem?

  • Avatar
    Charles Nadeau

    That could be the problem. Because you're authenticated with a Google password, your Zendesk profile might not have a Zendesk password. If you're an admin, you should be able to add one yourself. See Resetting user passwords. Use the Set option instead of the Reset one.

  • Avatar
    David Conway

    That did the trick.  Everything seems to be working fine now.  Thanks.

  • Avatar
    David Conway

    OK, I got all articles exported fine.  The articles contains all of the images and video attachments on them.  I am now playing with the idea having to restore a deleted article using curl.  I am using the following syntax:


    curl https://servienthelp.zendesk.com/api/v2/help_center/sections/{id}/articles.json \
    -d '{"article": {"title": "How to take pictures in low light", "body": "Use a tripod", "locale": "en-us" }}' \
    -v -u {email_address}:{password} -X POST -H "Content-Type: application/json"


    I can see {id} is the section id to which the article belong to.


    title is the actual title of the article being imported.


    body - do not know what that is or how to obtain that information.


    locale - language source


    email_address - needed for authentication


    password - needed for authentication.


    I successfully created the article, but do not see the body of the article.


     

  • Avatar
    Charles Nadeau

    Hi David, I'm not sure I understand the question. The `body` attribute specifies the content of the article. It's probably going to be one long JSON string, so with curl it's probably easier to import it in the curl statement. See Move JSON data to a file in our curl article.

  • Avatar
    David Conway

    When I run the make_backup.py program, I get a bunch of html files which are the articles. I open one of the html filea and I see the content, including any attachments. How do I import these html file(s) when needed to do so? Is there a way via curl command that will allow it. I have all the information I need like title, section id the article belongs to, the actual html file to be imported, the article is, and its position. So there is a scripted way to get the articles out, but is there a scripted way to get them back in?

  • Avatar
    Charles Nadeau (Edited )

    Ah, I see. You'll need a script to parse the content of each HMTL file, convert it to JSON, and post the data to HC. cURL is probably not the most efficient tool for this if you have more than a handful of articles.

     

    In Python, you can use the BeautifulSoup library to parse the content. One technique is described in Add the article translations, which is part of a larger tutorial on publishing localized articles on Help Center.

  • Avatar
    David Conway

    I would be using the curl command on a per document basis, unless you know of an easier way of doing this.  After I run the make_backup.py python script, I have a log file containing a lot of information like title, section id the article belongs to, the html filename of the article, and its position.  I also have all of the articles in html format.  I open these html files and see the article, its images, and video attachments.  Are you saying the article (html file) has to be converted to json in order to be uploaded to HC?  The script above downloaed the articles perfectly.  I there a similar script, or curl command to allow me to upload a document that has been deleted.  I would have to locate the html file that represents the deleted article based on the title.  Once the html file has been identified as the document to be uploaded, I would think curl or python script would do the trick.

  • Avatar
    Charles Nadeau

    Hi David,


    The backup script doesn't actually download the HTML files. It downloads JSON data, then decodes and writes the data to files (lines 27 to 33 in the "Code complete" sample above). 


    The API uses JSON is its data exchange format. The process of uploading articles to Help Center is the reverse of downloading the articles. Each article has to be converted to JSON and then sent to the API in that format. After the API receives the data, the JSON is decoded and published.


    In the cURL example you gave above, the "body" element is JSON (as is the entire "-d" line):


    -d '{"article": {"title": "How to take pictures in low light", "body": "Use a tripod", "locale": "en-us" }}'
    

    For more on encoding and decoding JSON, see the article Working with JSON.


    Hope this helps.

  • Avatar
    David Conway

    The code for backing up all of the Zendesk articles is great.  Is Zendesk working on a python script to import the html files, convert them to json, and then publish them?  I have found that you can open the html file via text editor and copy all of the tags that make up the body (content).  Create new article in Zendesk, choose section where the article is to be placed, give it the same title as before, click the <|> source code button and past what you copied from the html file earlier, the article shows up fine with images and any embedded videos.  Easy.  Just wanted to see if Zendesk was going to create a python script to import the backed up files generated from the Zendesk make_backup.py script.  Thanks.  

  • Avatar
    Jasper

    Thanks for this! I was able to backup the majority of articles.

    Apart from the articles which are publicly available, we also have some sections which are only available for logged in users. These articles don't seem to be included in the backup. Is there any way to do this? 

  • Avatar
    Charles Nadeau

    Hi Jasper,

    The articles returned by the API depend on the user role of the person making the API request. The API returns only the articles that the requesting agent, end user, or anonymous user can normally view in Help Center when using the web UI. To back up a Help Center with restricted content, you should ideally have the user role of Help Center manager to get all the content. 

    Administrators are Help Center managers by default. You can add Help Center managers by giving agents Help Center manager privileges. See Understanding Help Center roles and setting permissions

    Let me know if that's not the problem.

    Charles

  • Avatar
    Jasper

    Hi Charles,

    I already have the role of Administrator, so that's probably not the problem.

    Is there anything else I can do to solve this? 

Please sign in to leave a comment.

Powered by Zendesk